This paper proposes an improved RT-DETR (Real-Time Detection Transformer) method to address key challenges in highway target detection, including ambiguous feature recognition, missed multi-scale object detections, and the reliance on manual component design in traditional models. This method introduces the CGResNet (Context-guided Resnet) backbone network that integrates local features, environmental context, and global semantic information to strengthen feature representation for indistinct targets. It further employs an Attention-based Intra-scale Feature Interaction (AIFI) module and CNN-driven Cross-scale Feature Fusion (CCFF) module, optimizing semantic integration efficiency across multi-scale features. Experimental results on the Jiangsu Highway Dataset demonstrate that the improved model achieves 45.4% Average Precision (AP) and 72.6% AP50. This work outperforms the original RT-DETR in critical metrics such as AP75 (+3.0%) and APl (+0.6%), with particularly enhanced robustness in low-light and blurry scenarios. The real-time performance of the model (69 FPS) is slightly lower than YOLOv11-S due to the computational complexity of the CGBlock. However, its superior detection accuracy and adaptability to the scene make it as a robust and high-precision end-to-end solution for intelligent highway surveillance systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zixuan Fu
Junyong Zhai
IET conference proceedings.
Southeast University
Southeast University
Building similarity graph...
Analyzing shared references across papers
Loading...
Fu et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69fed0abb9154b0b82877c7e — DOI: https://doi.org/10.1049/icp.2026.1904