Abstract Small object detection in UAV aerial imagery faces significant challenges, including insufficient feature representation, severe background noise interference, and inadequate multi-scale fusion. To address these issues, this study proposes UMS-DET (Unmanned Aerial Vehicle Multi-scale Small-object Detector), a specialized framework engineered to optimize multi-scale representation and small object discrimination in UAV views. Firstly, we design the MSCGNet (Multi-Scale Context-Gated Network) backbone, which strengthens multi-scale feature representation through hierarchical context synergy aggregation and adaptive gated fusion mechanisms, all while reducing the parameter count. Subsequently, we introduce the SHFFI (Sparse Hierarchical Frequency Feature Interaction) encoder. This module integrates sparse window attention with frequency-domain enhancement techniques to effectively suppress background noise and improve discriminative capability for small targets. Furthermore, we propose the SOFCCFF (Small Object-Focused Cross-scale Feature Fusion) module, which leverages high-resolution shallow features to preserve fine-grained details, facilitating effective multi-scale integration. Experimental results demonstrate that UMS-DET achieves AP50 improvements of 5.2% and 3.0% on the VisDrone and DOTA datasets, respectively. Notably, the model reduces parameters by 22.1% and achieves a real-time inference speed of 67.3 FPS. The proposed method exhibits superior accuracy and robustness in dense small object scenarios and complex environments, offering an efficient and practical solution for UAV intelligent perception systems.
Hu et al. (Wed,) studied this question.