In complex traffic environments, image degradation caused by haze, low illumination, and occlusion significantly undermines the reliability of vehicle and pedestrian detection. To address these challenges, this paper proposes an aerial vision framework that tightly couples multi-level image enhancement with a lightweight detection architecture. At the image preprocessing stage, a cascaded “dehazing + enhancement” module is constructed, where a learning-based dehazing method is employed to restore long-range details affected by scattering artifacts. Additionally, structural fidelity is enhanced in low-light regions, while global brightness consistency is achieved. On the detection side, a lightweight yet robust detection architecture, termed GDEIM-SF, is designed. It adopts GoldYOLO as the lightweight backbone and integrates D-FINE as an anchor-free decoder. Moreover, two key modules, CAPR and ASF, are incorporated to enhance high-frequency edge modeling and multi-scale semantic alignment. Through evaluation on the VisDrone dataset, the proposed method achieves improvements of approximately 2.5 to 2.7 percentage points in core metrics such as mAP@50-90 compared to similar lightweight models, while maintaining a low parameter count and computational overhead. This ensures a balanced trade-off among detection accuracy, inference efficiency, and deployment adaptability, providing a practical and efficient solution for UAV-based visual perception tasks under challenging imaging conditions.
Zheng et al. (Mon,) studied this question.