In unmanned aerial vehicle (UAV) -based small object detection, enhancing object perception under complex backgrounds remains a critical challenge for current detection models. Owing to the small scale, low pixel occupancy, and cluttered backgrounds of UAV small objects, their discriminative features are prone to attenuation in deep networks, which limits multi-scale feature fusion and consequently increases detection difficulty. To resolve these issues, this paper presents a Detail-Aware Multi-scale Fusion Network (DMF-Net) for UAV small object detection, consisting of two core modules: the Multi-Branch Detail-Enhanced Module (MBDE) and the Dual-Attention Fusion (DAF) Module. First, during feature extraction, a multi-branch detail enhancement module with serial convolutions in each branch and residual connections is introduced to strengthen high-frequency details and local textures while preserving semantic consistency. Second, at the feature fusion stage, a dual-attention-based feature fusion module with a symmetric interaction structure is designed to dynamically evaluate the significance of features at different scales via adaptive attention mechanisms, enabling symmetric cross-scale interaction and fine-grained feature complementarity. The experimental results obtained on the challenging VisDrone and TinyPerson datasets confirm that DMF-Net outperforms existing state-of-the-art detection methods in terms of the accuracy of small object detection, while maintaining high inference efficiency. On the TinyPerson dataset, DMF-Net improves AP by 1. 4%, AP₅0 by 4. 1%, and AP₇5 by 0. 6% compared with YOLO11n, while maintaining 97. 2 FPS. Furthermore, it shows promising performance in complex backgrounds and densely populated scenarios.
Liu et al. (Wed,) studied this question.