Traffic object detection based on camera sensors is a critical task for autonomous vehicles. However, in nighttime conditions with adverse lighting, several challenges arise: blurred object edges, large-scale variations, and complex lighting conditions involving both overexposure and underexposure. As a result, it remains difficult for vision-based perception tasks to ensure reliable precision and rapid inference simultaneously. This paper proposes a novel, efficient, and lightweight vision module for detecting traffic objects in challenging nighttime environments, developed by enhancing the YOLOv8n architecture. Firstly, a bidirectional weighted feature fusion method (BiFPN) is incorporated in the path aggregation network, and an additional shallow P2 feature map is introduced to fully utilize key information from features at different scales. Then, the coordinate attention (CA) module is inserted between the end of the feature pyramid and the detection head to capture both semantic and spatial information of the object. Finally, the dynamic upsampler (DySample) is employed to guide the model in focusing on the detailed features of challenging samples, thereby balancing accuracy across different object categories. A subset of nighttime traffic scenes is curated from the BDD100K dataset for the evaluation of the proposed approach. The experiments demonstrate that, relative to the baseline, our method raises the mean average precision (mAP50) from 51.5% to 56.6%, achieves a 7.3% decrease in parameter quantity, and maintains a fast inference speed of 208 FPS. For the challenging bike and motorbike categories, notable improvements in detection accuracy are achieved. Compared with other advanced YOLO-series models such as YOLOv11, the proposed model also exhibits significant performance advantages with a 3.7% higher mAP50. Furthermore, our model demonstrates good generalization performance on the larger BDD100K nighttime partition. The findings confirm that our approach significantly improves detection accuracy without compromising real-time processing, highlighting its potential as a lightweight vision module providing reliable perceptual inputs for autonomous vehicle control and safety actuators in challenging nighttime scenarios.
Ou et al. (Tue,) studied this question.