Remote sensing object detection (RSOD) faces challenges such as large variations in target scale, diverse orientations, and complex backgrounds. Existing approaches struggle to simultaneously balance local feature extraction and global context modeling, while also failing to capture fine-grained semantic information across channel dimensions. To address these issues, we propose a novel remote sensing object detection backbone network, FI-MambaNet. Specifically, we design the Multi-Scale Architecture-Aware Mamba module, which combines multi-scale convolutions with multi-directional architecture-aware scanning strategies to capture both local details and long-range spatial correlations. Additionally, we introduce the Multi-granularity Contextual Self-Attention module, which employs multi-branch convolutions with varying receptive fields and strides. This simultaneously enhances semantic discrimination and models channel-level context. These modules enable efficient spatial–channel interactions within the FIBlock architecture. Extensive testing on the HRSC2016, DOTA-v1.0 and DOTA-v1.5 datasets demonstrates that FI-MambaNet achieves detection performance surpassing baseline methods while maintaining high computational efficiency. This validates its potential for handling multi-scale complex scenes in remote sensing object detection.
Liu et al. (Sat,) studied this question.