In recent years, unmanned aerial vehicles (UAVs) have become increasingly prevalent across diverse application scenarios due to their high maneuverability, compact size, and cost-effectiveness. However, these advantages also introduce significant challenges for UAV detection in complex environments. This paper proposes an efficient feature perception network (EFPNet) for UAV detection, developed on the foundation of the RT-DETR framework. Specifically, a dual-branch HiLo-ConvMix attention (HCM-Attn) mechanism and a pyramid sparse feature transformer network (PSFT-Net) are introduced, along with the integration of a DySample dynamic upsampling module. The HCM-Attn module facilitates interaction between high- and low-frequency information, effectively suppressing background noise interference. The PSFT-Net is designed to leverage deep-level features to guide the encoding and fusion of shallow features, thereby enhancing the model’s capability to perceive UAV texture characteristics. Furthermore, the integrated DySample dynamic upsampling module ensures efficient reconstruction and restoration of feature representations. On the TIB and Drone-vs-Bird datasets, the proposed EFPNet achieves mAP50 scores of 94.1% and 98.1%, representing improvements of 3.2% and 1.9% over the baseline models, respectively. Our experimental results demonstrate the effectiveness of the proposed method for small UAV detection.
Huang et al. (Tue,) studied this question.