In response to the challenges of dense target distribution, significant scale variations, and limited feature information for small objects in remote sensing images, this paper introduces a multi-scale fusion network with enhanced target features. Initially, a multi-layer feature aggregation module is constructed within the backbone network to enhance the capability of feature extraction. Subsequently, a multi-channel feature fusion module is implemented in the neck portion of the network to effectively capture cross-channel information and further enhance the expressive power of features at different scales. Moreover, a bi-directional multi-scale feature fusion module is proposed as a mechanism for feature fusion, using top-down and bottom-up fusion strategies to facilitate information interaction among features at different levels. Finally, in the detection layer, a fractional Fourier transform is applied to the image to extract additional feature information, which, combined with convolutional operations, improves the accuracy of small object detection. To validate the effectiveness of the proposed method, experiments were conducted on the data set for object detection in aerial images and Northwestern Polytechnical University very high resolution 10 data sets. The average detection accuracy achieved was 78.7% and 95.4%, respectively. Computational complexity was measured at 95.6 G, and the overall model size was 30.7 M. These results demonstrate that the proposed method excels at high detection accuracy, low computational complexity, and strong feature representation capability. It effectively improves the detection accuracy of small objects in remote sensing images, thereby enhancing the overall performance of small object detection in remote sensing imagery.
Shan et al. (Wed,) studied this question.