Small-object detection in UAV remote-sensing imagery is vital to a wide range of modern applications. However, existing methods often struggle with small-scale targets, dense occlusion, and complex backgrounds, leading to missed and false detections. To address these persistent issues, this paper proposes MACE-YOLO, a multi-path aggregation and cross-scale enhanced feature fusion network based on YOLOv11. The proposed Multi-path Aggregation and Context-aware Fusion (MACF) module strengthens fine-grained feature representation in the backbone network. Additionally, the Additive Cross-scale Feature Pyramid Network (ACFPN) improves the efficiency of cross-scale information interaction through the Channel-Additive Fusion (CAF) mechanism and multi-branch cross-layer connections. The Dynamic Head (DyHead) further optimizes feature re-weighting via multi-dimensional attention, while the Dilated Shared Pyramid Convolution (DSPC) module effectively preserves the detailed features of small objects. Experimental results on the VisDrone2019 dataset show that MACE-YOLO improves ARₛ, APₛ, and mAP50 over YOLOv11s by 2. 3%, 2. 2%, and 4. 1%, respectively. It maintains a relatively low parameter count, indicating a more favorable trade-off between accuracy and efficiency. Further evaluations on the RSOD and DIOR datasets confirm the algorithm’s superior generalization ability and performance.
Wang et al. (Tue,) studied this question.