ABSTRACT With the advancement of satellite remote sensing technology, object detection based on high‐resolution remote sensing imagery has emerged as a prominent research focus in the field of computer vision. Although numerous algorithms have been developed for remote sensing image object detection, they still suffer from challenges such as low detection accuracy and high false positive rates. To address these issues, we propose a novel architecture, the multiscale feature fusion network (MSFFNet). MSFFNet is composed of three key components: the Large Selective Kernel Block (LSKBlock), the Space‐to‐Depth ADown (SPDA) module and the Double Feature Aggregation Neck (DFAN). Specifically, the LSKBlock adaptively captures salient target features by dynamically adjusting the receptive field size, thereby enhancing detection precision. The SPDA module converts spatial correlations into channel‐wise dependencies by segmenting and reordering the feature maps, which helps preserve fine‐grained information, suppress background interference and reduce false detections. Furthermore, the DFAN integrates shallow and deep features through a multiscale feature fusion module (MSFFM), enabling the extraction of multiscale target representations and improving overall detection performance. Extensive experiments on public datasets, SIMD, VisDrone2019 and DIOR, demonstrate the effectiveness of our approach. Compared with the YOLOv9s baseline model, MSFFNet achieves improvements in mAP50% of 0.6%, 1.9% and 3.5%, respectively.
Zong et al. (Fri,) studied this question.