Unmanned Aerial Vehicle (UAV) remote sensing has become essential for real-time earth observation applications, including precision agriculture, traffic monitoring, and disaster response. However, small-target detection in UAV aerial imagery still faces critical challenges: extreme scale variation due to variable flight altitudes, background interference from complex terrain, and insufficient pixel information for tiny objects. To address these issues, this work proposes FKIFM-DETR, a real-time transformer-based detection framework leveraging multi-domain information fusion. First, a Spatial-Frequency Fusion Module (SFM) is designed to integrate spatial and frequency-domain features for capturing fine-grained target details while suppressing background noise; second, a High–Low Frequency Block (HL-Block) is introduced to separately process high-frequency local details and low-frequency global context, balancing detail retention and semantic awareness; finally, a Channel Feature Recalibration-Enhanced Feature Pyramid Network (SPCR-FPN) is employed to strengthen the interaction between shallow spatial features and deep semantic features. On the VisDrone2019 dataset, FKIFM-DETR achieves 6.3% and 5.3% improvements in mAP@0.5 and mAP@0.5:0.95 over the RT-DETR baseline, respectively; evaluations on TinyPerson and HIT-UAV datasets further demonstrate its cross-scenario applicability. These results demonstrate the potential of FKIFM-DETR for practical UAV remote sensing applications such as crowd surveillance, vehicle tracking, and emergency rescue.
Yang et al. (Thu,) studied this question.