Abstract 3D object detection is a core task in environmental perception for autonomous driving. Current multi-modal methods, which fuse features from various sensors such as LiDAR and cameras, have shown promise in enhancing detection performance to some extent. Nevertheless, these methods remain susceptible to factors such as calibration errors and noise interference in real-world scenarios. These issues lead to suboptimal alignment and fusion of multi-modal features, thereby degrading the model’s detection performance and generalization capability. To overcome these limitations, this paper proposes the DAD-Fusion method. This method generates a learnable offset field through a dynamic alignment module, which adaptively corrects the spatial misalignment of LiDAR and camera bird's eye view (BEV) features. Concurrently, a diffusion model is introduced to effectively suppress the perceptual noise in multi-modal fusion through a progressive denoising mechanism to enhance the feature representation capability. Extensive experiments show that the proposed DAD-Fusion achieves excellent performance on the nuScenes dataset, reaching 71.1% mAP and 73.4% NDS on the test set. To further validate its generalization capabilities, our model was also evaluated on the KITTI dataset, where it significantly outperforms the baseline method. In addition, our model generates optimized BEV features that can be shared end-to-end and jointly optimized with downstream tasks, contributing to the development of more efficient and robust perception and decision-making systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Taiguo Li
hao wang
X.S. Tang
Measurement Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68c188509b7b07f3a061200b — DOI: https://doi.org/10.1088/1361-6501/ae02b2