What question did this study set out to answer?

The aim is to enhance detection accuracy and reduce false positives in object detection in remote sensing imagery.

January 18, 2026Open Access

MSFFNet: Multiscale Feature Fusion Network for Small Target Detection in Remote Sensing Images

Key Points

The aim is to enhance detection accuracy and reduce false positives in object detection in remote sensing imagery.
Developed a novel architecture called MSFFNet comprising three main components: LSKBlock, SPDA module, and DFAN.
Implemented LSKBlock to capture salient features with adjustable receptive fields.
Utilized SPDA module to convert spatial correlations into channel-wise dependencies.
Integrated shallow and deep features using DFAN for effective multiscale representation.
Achieved improvements in mAP50% of 0.6%, 1.9%, and 3.5% compared to the YOLOv9s baseline model across public datasets.
Demonstrated enhanced detection precision and reduced false detections on datasets like SIMD, VisDrone2019, and DIOR.

Abstract

ABSTRACT With the advancement of satellite remote sensing technology, object detection based on high‐resolution remote sensing imagery has emerged as a prominent research focus in the field of computer vision. Although numerous algorithms have been developed for remote sensing image object detection, they still suffer from challenges such as low detection accuracy and high false positive rates. To address these issues, we propose a novel architecture, the multiscale feature fusion network (MSFFNet). MSFFNet is composed of three key components: the Large Selective Kernel Block (LSKBlock), the Space‐to‐Depth ADown (SPDA) module and the Double Feature Aggregation Neck (DFAN). Specifically, the LSKBlock adaptively captures salient target features by dynamically adjusting the receptive field size, thereby enhancing detection precision. The SPDA module converts spatial correlations into channel‐wise dependencies by segmenting and reordering the feature maps, which helps preserve fine‐grained information, suppress background interference and reduce false detections. Furthermore, the DFAN integrates shallow and deep features through a multiscale feature fusion module (MSFFM), enabling the extraction of multiscale target representations and improving overall detection performance. Extensive experiments on public datasets, SIMD, VisDrone2019 and DIOR, demonstrate the effectiveness of our approach. Compared with the YOLOv9s baseline model, MSFFNet achieves improvements in mAP50% of 0.6%, 1.9% and 3.5%, respectively.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper