Abstract Small object detection in unmanned aerial vehicle (UAV) aerial imagery faces substantial challenges due to small target scales, complex backgrounds, noise interference, and so on. To enhance multi-scale feature representation and detection efficiency, this paper proposes MSEF-YOLO11s. Specifically, we first design a lightweight partial multi-scale (LPMS) module, which effectively aggregates cross-scale information and enhances multi-scale representations in the backbone for small objects. Secondly, to dynamically adjust feature weights and mitigate feature conflicts in the neck, we devise a multi-scale boundary-semantic alignment (MS-BSA) based on adaptive attention, which can further avoid computational redundancy for sufficient fusion. Finally, a lightweight shared detail detection head (LSDDH) replaces the decoupled head structure with shared convolutional layers, resolving the issue of parameter explosion associated with adding a dedicated small object detection head. Experimental results demonstrate the effectiveness of the proposed model. Specifically, compared to the baseline YOLO11s, MSEF-YOLO11s achieves an improvement of 6.6% in mAP50 on the VisDrone2019 test set, with only 4.4M increase in parameters. Furthermore, mAP50 on the TinyPerson test set increases from 22.8% to 28.1%, confirming the model’s strong generalization capability.
Zhang et al. (Tue,) studied this question.