During UAV aerial photography tasks, influenced by flight altitude and imaging mechanisms, the target in images often exhibits characteristics such as small size, complex backgrounds, and small inter-class differences. Under single optical modality, the weak and less discriminative feature representation of targets in drone-captured images makes them easily overwhelmed by complex background noise, leading to low detection accuracy, high missed-detection and false-detection rates in current object detection networks. Moreover, such methods struggle to meet all-weather and all-scenario application requirements. To address these issues, this paper proposes DVIF-Net, a visible-infrared fusion network for small-target detection in UAV aerial images, which leverages the complementary characteristics of visible and infrared images to enhance detection capability in complex environments. Firstly, a dual-branch feature extraction structure is designed based on YOLO architecture to separately extract features from visible and infrared images. Secondly, a P4-level cross-modal fusion strategy is proposed to effectively integrate features from both modalities while reducing computational complexity. Meanwhile, we design a novel dual context-guided fusion module to capture complementary features through channel attention of visible and infrared images during fusion and enhance interaction between modalities via element-wise multiplication. Finally, an edge information enhancement module based on cross stage partial structure is developed to improve sensitivity to small-target edges. Experimental results on two cross-modal datasets, DroneVehicle and VEDAI, demonstrate that DVIF-Net achieves detection accuracies of 85.8% and 62%, respectively. Compared with YOLOv10n, it has improved by 21.7% and 10.5% in visible modality, and by 7.4% and 30.5% in infrared modality, while maintaining a model parameter count of only 2.49 M. Furthermore, compared with 15 other algorithms, the proposed DVIF-Net attains SOTA performance. These results indicate that the method significantly enhances the detection capability for small targets in UAV aerial images, offering a high-precision and lightweight solution for real-time applications in complex aerial scenarios.
Zhao et al. (Sat,) studied this question.