Detecting small objects in unmanned aerial vehicle (UAV) remote sensing imagery is essential for real-world applications such as forest pest monitoring and smart city management. However, existing object detection algorithms struggle with dense objects, significant scale variations, and cluttered backgrounds. To address these challenges, we propose a cross-layer feature fusion YOLO framework (CF-YOLO) based on YOLOv11s. CF-YOLO integrates three key modules: (1) a split-block attention module (SBAM) that improves contextual perception by capturing both global semantic relationships and fine local details; (2) a cross-layer multi-scale feature fusion (CMFF) module that effectively combines shallow spatial details with deep semantic features, enhancing the localization and recognition of small objects; and (3) a multi-branch downsampling module that preserves high-resolution shallow information through diverse downsampling paths, providing richer inputs for feature fusion. Evaluations on the VisDrone benchmark demonstrate that CF-YOLO outperforms YOLOv11s, improving precision by 7.57%, recall by 6.04%, and mAP@0.5 by 8.07%, while maintaining a comparable number of parameters. The proposed method significantly improves the detection accuracy for small objects in UAV remote sensing images without increasing the number of parameters.
Wei et al. (Tue,) studied this question.