Semantic segmentation of remote sensing imagery is crucial for applications such as land resource management and urban planning, yet it remains challenging due to low intra-class variation, ambiguous boundaries, and the coexistence of multi-scale geospatial features. To tackle these issues, we propose GS-USTNet, a novel architecture that enhances both feature representation and boundary recovery. First, we introduce a Global–Local Adaptive Convolution (GLAConv) module that dynamically fuses global contextual cues with local responses to generate content-aware convolutional weights, thereby improving feature discriminability. Second, we design a Skip-Guided Attention (SGA) mechanism that leverages spatial–channel joint attention to guide the decoder, effectively mitigating attention dispersion in complex scenes or under class imbalance and significantly sharpening object boundaries. Built upon the efficient USTNet framework, our model achieves substantial performance gains without compromising computational efficiency. Extensive experiments on benchmark datasets demonstrate that GS-USTNet achieves consistent improvements over the original USTNet, with gains of approximately 3.5% in overall accuracy and 6.0% in F1-score across datasets. Ablation studies further confirm the effectiveness of the proposed GLAConv and SGA modules. This work provides an efficient and robust approach for fine-grained semantic segmentation of high-resolution remote sensing imagery.
Qian et al. (Wed,) studied this question.