Multimodal fusion methods leveraging various sensors provide strong support for 3D object detection. However, under adverse weather conditions such as rain, fog, snow, and intense glare, complex environmental factors can degrade sensor data quality, leading to increased false positives and missed detections. In addition, sensor modalities (e.g., LiDAR and cameras) inherently vary in information density, and directly fusing them can cause critical details in high-density data to be diluted by low-density data, thereby increasing errors. To address these issues, we propose a Semantic-Enhanced Bidirectional Multimodal Fusion (SeBFusion) framework. By introducing a semantic enhancement mechanism and a bidirectional fusion strategy, SeBFusion mitigates the impact of noise under adverse weather and alleviates information dilution in multimodal fusion. Specifically, SeBFusion first employs a virtual point generation and camera semantic injection module to selectively map image semantic features into 3D space, producing semantically enhanced LiDAR features to compensate for the sparsity of the raw LiDAR point cloud. Then, during cross-modal interaction, we design a bidirectional cross-attention fusion module. This module estimates the confidence of each modality and adaptively reweights the bidirectional information flow, thereby reducing the risk of noise propagation across modalities and improving the robustness and accuracy of 3D object detection in complex environments. Experiments on adverse-weather versions of datasets such as KITTI-C and nuScenes-C validate the effectiveness and superiority of the proposed method. On the nuScenes-C dataset, it achieves 66.2% mAP and 66.6% mAP under fog and snow conditions, respectively.
Jiao et al. (Wed,) studied this question.