The rapid progress of sensing technologies, artificial intelligence, and embedded computing has significantly accelerated the development of autonomous vehicles. Among the core challenges of higher-level driving automation, reliable environmental perception remains one of the most critical. This review presents a systematic PRISMA-based analysis of multimodal sensor technologies and fusion architectures applied in autonomous driving, based on 66 peer-reviewed studies published between 2014 and 2025. The study examines the operational characteristics, advantages, and limitations of major sensing modalities, including cameras, LiDAR, radar, ultrasonic sensors, and GNSS/IMU-based localization systems. Particular attention is given to multimodal fusion strategies, covering early, mid-level, high-level, and transformer-based architectures that combine complementary sensor information to improve perception robustness and decision reliability. The review further synthesizes current evidence on performance under adverse environmental conditions, benchmark validation practices, real-time computational constraints, and the growing role of functional safety frameworks such as ISO 26262 and SOTIF. Emerging research directions, including 4D radar, self-supervised long-range fusion, foundation models, and cooperative V2X perception, are also discussed. The findings indicate that multimodal sensor fusion is a highly effective architectural strategy for improving scalability, fail-operational robustness, and certifiable safety in autonomous driving systems, particularly in higher-level automation scenarios. Future research should focus on uncertainty-aware fusion, explainable cross-modal reasoning, large-scale real-world validation, and efficient hardware–software co-design to support robust Level 4–5 vehicle autonomy.
Viktor et al. (Tue,) studied this question.