What question did this study set out to answer?

This research aims to enhance 3D object detection for autonomous driving by addressing challenges in multi-view data processing.

June 5, 2026Open Access

SpecBEV: An End-to-End BEV 3D Object Detection Algorithm Based on Frequency-Domain Analysis and Geometric Alignment

Key Points

This research aims to enhance 3D object detection for autonomous driving by addressing challenges in multi-view data processing.
Developed SpecBEV, an end-to-end framework for 3D object detection using BEV representations.
Implemented a frequency-prior spatial attention module (SA-Freq) to suppress redundant activations.
Created a cross-view feature alignment module (CFA) to improve geometrical consistency of BEV features.
SpecBEV achieved a mean Average Precision (mAP) of 0.3856 and a Nearest Detection Score (NDS) of 0.4871.
It improved mAP by 0.1028 (36.35% relative improvement) over the BEVDet baseline.
NDS increased by 0.1371 (39.17% relative improvement) compared to the baseline.

Abstract

This paper proposes SpecBEV, an enhanced multi-view 3D object detection framework for autonomous driving using bird’s-eye-view (BEV) representations. Compared with LiDAR-based methods, multi-camera perception offers higher cost-effectiveness and flexibility. However, existing end-to-end BEV detectors suffer from illumination variations, occlusions, and cross-view inconsistencies during feature projection and fusion. These issues often introduce redundant background activations and geometric misalignment in the BEV space, leading to missed detections, false positives, and unstable localization. To address them, we introduce a frequency-prior spatial attention module (SA-Freq). It utilizes fixed discrete cosine transform (DCT) bases to model the multi-band responses of BEV features and produce spatial attention weights that suppress redundant activations and enhance target-related regions. We further design a cross-view feature alignment module (CFA) to ensure consistency between single-view BEV features and the fused BEV representation, thereby reducing geometric inconsistency and improving localization stability. Experiments on the nuScenes validation set demonstrate that SpecBEV achieves 0.3856 in mAP and 0.4871 in NDS. Compared with the BEVDet baseline, it yields an absolute gain of 0.1028 (36.35% relative improvement) in mAP and an absolute gain of 0.1371 (39.17% relative improvement) in NDS, which validates the effectiveness of the proposed method.

SpecBEV: An End-to-End BEV 3D Object Detection Algorithm Based on Frequency-Domain Analysis and Geometric Alignment

Key Points

Abstract

Cite This Study