Remote sensing image (RSI) semantic segmentation is challenged by high inter-class spectral similarity, significant intra-class scale variation, and limited availability of labeled data. Although semi-supervised learning has reduced the dependency on large-scale annotations, existing approaches still suffer from degraded boundary precision and incomplete geometric structures in complex remote sensing scenes. To address these issues, this paper proposes a Multi-scale Consistency and Cross-Attention Teacher–Student Network (MSCA-TSN) for semi-supervised RSI semantic segmentation. Specifically, an Adaptive Multi-scale Uncertainty Consistency module (AMUC) is introduced to model feature reliability across hierarchical levels. By leveraging Monte Carlo Dropout to estimate feature uncertainty and employing adaptive weighting for multi-scale consistency learning, AMUC effectively suppresses unreliable supervision and improves segmentation robustness under significant scale variations. Furthermore, a Cross-Teacher–Student Cross-Attention Module (CCAM) is designed to enhance cross-network feature interaction. In CCAM, student features act as queries while teacher features serve as keys and values to construct cross-attention, enabling the student network to reconstruct more discriminative feature representations and reduce confusion among visually similar land-cover categories. Extensive experiments are conducted on the LoveDA and ISPRS Potsdam benchmarks under both 5% and 10% labeling ratios. On the LoveDA dataset, MSCA-TSN achieves mIoU scores of 51.05% and 52.41% under 5% and 10% labeled data, respectively, outperforming several state-of-the-art semi-supervised methods. On the ISPRS Potsdam dataset, the proposed method further reaches 75.35% and 76.34% mIoU under the same settings. Ablation and parameter sensitivity analyses further verify the effectiveness and robustness of the proposed AMUC and CCAM modules.
Cao et al. (Tue,) studied this question.