What question did this study set out to answer?

The aim is to improve semantic segmentation in remote sensing images using semi-supervised learning techniques.

April 23, 2026Open Access

Semi-Supervised Remote Sensing Image Semantic Segmentation Based on Multi-Scale Consistency and Cross-Attention

Key Points

The aim is to improve semantic segmentation in remote sensing images using semi-supervised learning techniques.
Developed a Multi-scale Consistency and Cross-Attention Teacher–Student Network (MSCA-TSN) for segmentation.
Introduced Adaptive Multi-scale Uncertainty Consistency (AMUC) to model feature reliability across hierarchies.
Implemented a Cross-Teacher–Student Cross-Attention Module (CCAM) to enhance feature interaction between networks.
Achieved mIoU scores of 51.05% and 52.41% on the LoveDA dataset under 5% and 10% labeled data respectively.
On the ISPRS Potsdam dataset, achieved mIoU scores of 75.35% and 76.34% under similar settings.
Ablation studies confirmed the effectiveness of both AMUC and CCAM modules.

Abstract

Remote sensing image (RSI) semantic segmentation is challenged by high inter-class spectral similarity, significant intra-class scale variation, and limited availability of labeled data. Although semi-supervised learning has reduced the dependency on large-scale annotations, existing approaches still suffer from degraded boundary precision and incomplete geometric structures in complex remote sensing scenes. To address these issues, this paper proposes a Multi-scale Consistency and Cross-Attention Teacher–Student Network (MSCA-TSN) for semi-supervised RSI semantic segmentation. Specifically, an Adaptive Multi-scale Uncertainty Consistency module (AMUC) is introduced to model feature reliability across hierarchical levels. By leveraging Monte Carlo Dropout to estimate feature uncertainty and employing adaptive weighting for multi-scale consistency learning, AMUC effectively suppresses unreliable supervision and improves segmentation robustness under significant scale variations. Furthermore, a Cross-Teacher–Student Cross-Attention Module (CCAM) is designed to enhance cross-network feature interaction. In CCAM, student features act as queries while teacher features serve as keys and values to construct cross-attention, enabling the student network to reconstruct more discriminative feature representations and reduce confusion among visually similar land-cover categories. Extensive experiments are conducted on the LoveDA and ISPRS Potsdam benchmarks under both 5% and 10% labeling ratios. On the LoveDA dataset, MSCA-TSN achieves mIoU scores of 51.05% and 52.41% under 5% and 10% labeled data, respectively, outperforming several state-of-the-art semi-supervised methods. On the ISPRS Potsdam dataset, the proposed method further reaches 75.35% and 76.34% mIoU under the same settings. Ablation and parameter sensitivity analyses further verify the effectiveness and robustness of the proposed AMUC and CCAM modules.

Semi-Supervised Remote Sensing Image Semantic Segmentation Based on Multi-Scale Consistency and Cross-Attention

Key Points

Abstract

Cite This Study