What question did this study set out to answer?

The aim is to improve detection efficiency and accuracy of small targets in UAV imagery using multi-scale features.

January 22, 2026Open Access

MSEF-YOLO11s: a multi-scale extraction and fusion network for small target detection in drone imagery

Key Points

The aim is to improve detection efficiency and accuracy of small targets in UAV imagery using multi-scale features.
Developed a lightweight partial multi-scale module for better feature representation.
Introduced a multi-scale boundary-semantic alignment mechanism for dynamic feature weighting.
Created a lightweight shared detail detection head to minimize parameter overload.
Achieved a 6.6% increase in mAP50 on the VisDrone2019 test set compared to the baseline.
Improved mAP50 from 22.8% to 28.1% on the TinyPerson test set, indicating stronger generalization.

Abstract

Abstract Small object detection in unmanned aerial vehicle (UAV) aerial imagery faces substantial challenges due to small target scales, complex backgrounds, noise interference, and so on. To enhance multi-scale feature representation and detection efficiency, this paper proposes MSEF-YOLO11s. Specifically, we first design a lightweight partial multi-scale (LPMS) module, which effectively aggregates cross-scale information and enhances multi-scale representations in the backbone for small objects. Secondly, to dynamically adjust feature weights and mitigate feature conflicts in the neck, we devise a multi-scale boundary-semantic alignment (MS-BSA) based on adaptive attention, which can further avoid computational redundancy for sufficient fusion. Finally, a lightweight shared detail detection head (LSDDH) replaces the decoupled head structure with shared convolutional layers, resolving the issue of parameter explosion associated with adding a dedicated small object detection head. Experimental results demonstrate the effectiveness of the proposed model. Specifically, compared to the baseline YOLO11s, MSEF-YOLO11s achieves an improvement of 6.6% in mAP50 on the VisDrone2019 test set, with only 4.4M increase in parameters. Furthermore, mAP50 on the TinyPerson test set increases from 22.8% to 28.1%, confirming the model’s strong generalization capability.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper