Semantic segmentation of high-resolution remote sensing images constitutes an important foundation for urban mapping and land-cover interpretation. However, objects in remote sensing scenes usually exhibit large-scale variations, significant intra-class differences, and complex background interference. Due to these factors, existing methods for complex high-resolution scenes still suffer from insufficient global semantic modeling, boundary blurring, and small-object omission. To address the above challenges, this paper proposes a Global Category-Center Prior-Guided Spatial-Frequency Collaborative Network (GC2F-Net). Specifically, ResNet-50 is adopted as the encoder, and a Global Category-Center Module is utilized to generate a global category-center prior based on deep features, which is then combined with a Fourier Global Enhancement Module to enhance deep features in the frequency domain. During the decoding stage, a Local Category-Aware Frequency Attention Module is employed to progressively refine feature representations under the guidance of the global category-center prior, thereby achieving collaborative improvement in global semantic consistency and local detail recovery. Experimental results demonstrate that GC2F-Net achieves robust and competitive segmentation performance on multiple public remote sensing semantic segmentation datasets. The proposed method provides an effective spatial-frequency collaborative modeling paradigm for the semantic segmentation of high-resolution remote sensing images.
Li et al. (Sat,) studied this question.