ABSTRACT The current challenges in remote sensing image segmentation are twofold: balancing global and local information and reducing redundant information in shallow features of the encoder‐decoder architectures. To solve the problems, we proposed a semantic segmentation network for remote sensing images, called the Multi‐Scale Sliding Refined Network (MSSRNet), which includes a Multi‐Scale Sliding Transmit Attention (MSSTA) module and a Shallow Feature Refinement (SFR) module. MSSTA enhances both local detail extraction and long‐range dependency modeling by combining sliding window attention (SWA) with standard self‐attention (SA) while integrating multi‐scale features. SFR effectively reduces redundant information in shallow features and enhances feature representation by reconstructing spatial information and enhancing channel information. The effectiveness of MSSRNet was validated on three publicly available datasets, which are ISPRS Vaihingen, ISPRS Potsdam, and LoveDA. Experimental results demonstrate that MSSRNet achieves state‐of‐the‐art mIoU scores of 85.26%, 87.93%, and 55.97%, compared to recent high‐performance models like MMT, MSSRNet improves mIoU by 1.1% on the Vaihingen dataset while reducing parameter counts by approximately 34.6%, proving its superior balance between accuracy and computational efficiency.
Gao et al. (Wed,) studied this question.