What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

Robust 3D Object Detection in Complex Traffic via Unified Feature Alignment in Bird’s Eye View

Key Points

BEVAlign shows significant improvements in 3D object detection, achieving 71.7% mAP in dense traffic environments.
The framework effectively mitigates occlusions and depth estimation errors, enhancing overall safety for intelligent vehicles.
A local-global feature alignment approach refines sensor data and improves alignment through graph-based modeling.
Experimentation on the nuScenes benchmark demonstrates BEVAlign's superior performance over existing methods.

Abstract

Reliable three-dimensional (3D) object detection is critical for intelligent vehicles to ensure safety in complex traffic environments, and recent progress in multi-modal sensor fusion, particularly between LiDAR and camera, has advanced environment perception in urban driving. However, existing approaches remain vulnerable to occlusions and dense traffic, where depth estimation errors, calibration deviations, and cross-modal misalignment are often exacerbated. To overcome these limitations, we propose BEVAlign, a local–global feature alignment framework designed to generate unified BEV representations from heterogeneous sensor modalities. The framework incorporates a Local Alignment (LA) module that enhances camera-to-BEV view transformation through graph-based neighbor modeling and dual-depth encoding, mitigating local misalignment from depth estimation errors. To further address global misalignment in BEV representations, we present the Global Alignment (GA) module comprising a bidirectional deformable cross-attention (BDCA) mechanism and CBR blocks. BDCA employs dual queries from LiDAR and camera to jointly predict spatial sampling offsets and aggregate features, enabling bidirectional alignment within the BEV domain. The stacked CBR blocks then refine and integrate the aligned features into unified BEV representations. Experiment on the nuScenes benchmark highlights the effectiveness of BEVAlign, which achieves 71.7% mAP, outperforming BEVFusion by 1.5%. Notably, it achieves strong performance on small and occluded objects, particularly in dense traffic scenarios. These findings provide a basis for advancing cooperative environment perception in next-generation intelligent vehicle systems.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper