With autonomous driving advances, efficient object detection frameworks are increasingly demanded. While deep learning detectors perform well, small object detection remains challenging in complex scenarios. We propose AD-DETR, a Transformer-based model that uses Multi-scale Multi-head Self-Attention for backbone features. A cross-scale fusion module captures local features, while an enhanced fusion block improves adaptability to occlusions and scale changes. A content-aware deformable attention optimizes weighting. Experiments demonstrate that AD-DETR achieves a 52.7% mAP@0.5 on the BDD100K dataset, outperforming RT-DETR by 2.7%.
Hongyue Li (Fri,) studied this question.