Small object detection in remote sensing is critical for urban planning, environmental monitoring, and disaster response. This task is challenged by extreme scale variations and complex background clutter. Current methods often address feature enhancement and multi-scale fusion separately, leading to fragmented information flow and suboptimal adaptability. Furthermore, achieving comprehensive contextual modeling typically incurs high computational cost. In this paper, we propose MSFE-YOLO, a unified multi-scale feature enhancement framework designed to address these limitations efficiently. Specifically, we introduce an Adaptive Multi-scale Correlation Attention (AMCA) module into the backbone to capture long-range spatial dependencies and suppress background noise, thereby improving feature discrimination across scales. To bridge the semantic gap across scales, we design a Selective Cross-Attention Fusion (SCAF) module combined with an Adaptive Point Upsampler (APU). This combination replaces static interpolation with content-aware sampling, ensuring the preservation of critical details for small objects during feature reconstruction, which enhances feature propagation and fusion. Furthermore, a Multi-scale Dilated Receptive Field (MDRF) module is integrated into the detection head to aggregate multi-scale contextual information without incurring significant computational costs, further boosting feature fusion. Experimental results on the Remote Sensing Object Detection (RSOD) and Vehicle Detection in Aerial Imagery (VEDAI) datasets show significant and consistent gains. On RSOD, MSFE-YOLO improves the baseline by +4.6% mAP@50 and +8.9% mAP@50-95, while increasing the parameter count from 11.17M to 12.7M and reducing the inference speed from 105.62 FPS to 74.56 FPS. These results indicate a favorable accuracy-efficiency trade-off, demonstrating that our framework effectively improves multi-scale representation and detection accuracy while maintaining acceptable inference speeds for practical remote sensing applications.
Feng et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: