What question did this study set out to answer?

The research aims to improve 3D object detection accuracy under adverse weather conditions by addressing data quality issues.

March 21, 2026Open Access

Semantic-Enhanced Bidirectional Multimodal Fusion for 3D Object Detection Under Adverse Weather

Key Points

The research aims to improve 3D object detection accuracy under adverse weather conditions by addressing data quality issues.
Developed a Semantic-Enhanced Bidirectional Multimodal Fusion (SeBFusion) framework.
Utilized a virtual point generation and camera semantic injection module for feature enhancement.
Implemented a bidirectional cross-attention fusion module to adaptively reweight information flow.
Achieved 66.2% mean Average Precision (mAP) on the nuScenes-C dataset under fog conditions.
Attained 66.6% mAP under snow conditions, demonstrating improved robustness in challenging environments.

Abstract

Multimodal fusion methods leveraging various sensors provide strong support for 3D object detection. However, under adverse weather conditions such as rain, fog, snow, and intense glare, complex environmental factors can degrade sensor data quality, leading to increased false positives and missed detections. In addition, sensor modalities (e.g., LiDAR and cameras) inherently vary in information density, and directly fusing them can cause critical details in high-density data to be diluted by low-density data, thereby increasing errors. To address these issues, we propose a Semantic-Enhanced Bidirectional Multimodal Fusion (SeBFusion) framework. By introducing a semantic enhancement mechanism and a bidirectional fusion strategy, SeBFusion mitigates the impact of noise under adverse weather and alleviates information dilution in multimodal fusion. Specifically, SeBFusion first employs a virtual point generation and camera semantic injection module to selectively map image semantic features into 3D space, producing semantically enhanced LiDAR features to compensate for the sparsity of the raw LiDAR point cloud. Then, during cross-modal interaction, we design a bidirectional cross-attention fusion module. This module estimates the confidence of each modality and adaptively reweights the bidirectional information flow, thereby reducing the risk of noise propagation across modalities and improving the robustness and accuracy of 3D object detection in complex environments. Experiments on adverse-weather versions of datasets such as KITTI-C and nuScenes-C validate the effectiveness and superiority of the proposed method. On the nuScenes-C dataset, it achieves 66.2% mAP and 66.6% mAP under fog and snow conditions, respectively.

Bookmark

View Full Paper

Cite This Study

Jiao et al. (Wed,) studied this question.

synapsesocial.com/papers/69be3be16e48c4981c679c1e https://doi.org/https://doi.org/10.3390/app16062943

Bookmark

View Full Paper