Unmanned aerial vehicles (UAVs) have been widely used in aerial photography and target detection tasks due to their flexibility and unique perspective. However, small targets often suffer from insufficient resolution, uneven scale distribution, and complex background clutter, which are constrained by imaging conditions such as high-altitude imaging, long-distance capture, and wide field of view. These factors weaken the feature representation and generalization ability of the model, becoming the key bottleneck that restricts the improvement of small target detection accuracy in UAV scenarios. To address the above issues, this paper proposes a small target detection algorithm for UAV perspective, namely MTD-YOLO. First, a Parallel Multi-Scale Receptive Field Unit (PMSRFU) is designed. This unit effectively enhances the receptive field range of feature extraction and the fusion ability of multi-scale contextual information by introducing parallel branches with different-sized convolutional kernels. Second, we embed PMSRFU into a C2f block to form C2f-PMSRFU, which reuses shallow details and fuses multi-scale features to clarify edges and textures in small targets, yielding stronger fine-grained representations. Finally, an efficient detection head with task decoupling, dynamic alignment, and adaptive scale adjustment capabilities, namely SDIDA-Head, is proposed, which significantly improves the model’s small target detection accuracy. Extensive experiments on the VisDrone2019 and HazyDet datasets demonstrate that MTD-YOLO achieves a 7.6% and 6.6% increase in mAP@0.5 compared to the baseline YOLOv8n, respectively. Meanwhile, the Precision is improved by 6.0% and 1.1%, and the Recall is enhanced by 7.5% and 6.9%, respectively. These results fully validate the effectiveness and superiority of the proposed method in UAV small target detection tasks.
Xie et al. (Wed,) studied this question.