What question did this study set out to answer?

The research aims to improve semantic segmentation in remote sensing images by using a semi-supervised learning approach to overcome label scarcity issues.

February 5, 2026Open Access

A Semi-Supervised Transformer with a Curriculum Training Pipeline for Remote Sensing Image Semantic Segmentation

Key Points

The research aims to improve semantic segmentation in remote sensing images by using a semi-supervised learning approach to overcome label scarcity issues.
Developed a Curriculum-based Self-supervised and Semi-supervised Pipeline (CSSP) with an easy-to-hard training strategy.
Implemented in-domain pretraining for robust feature representation.
Designed a finetuning stage to prevent overfitting.
Integrated Difficulty-Adaptive ClassMix augmentation to enhance weaker categories.
Applied a Progressive Intensity Adaptation strategy to optimize augmentation.
Achieved 82.16% mean Intersection over Union (mIoU) with only 1/32 of the labeled data in the Potsdam dataset.
CSSP nearly matched fully supervised performance (82.24%).
Extended approach for semi-supervised domain adaptation (Cross-Domain CSSP) outperformed existing SSDA and UDA methods.

Abstract

Semantic segmentation of remote sensing images is crucial for geospatial applications but is severely hampered by the prohibitive cost of pixel-level annotations. Although semi-supervised learning (SSL) offers a solution by leveraging unlabeled data, its application to Vision Transformers (ViTs) often encounters overfitting and even training instability under extreme label scarcity. To tackle these challenges, we propose a Curriculum-based Self-supervised and Semi-supervised Pipeline (CSSP). The pipeline adopts a staged, easy-to-hard training strategy, commencing with in-domain pretraining for robust feature representation, followed by a carefully designed finetuning stage to prevent overfitting. The pipeline further integrates a novel Difficulty-Adaptive ClassMix (DA-ClassMix) augmentation that dynamically reinforces underperforming categories and a Progressive Intensity Adaptation (PIA) strategy that systematically escalates augmentation strength to maximize model generalization. Extensive evaluations on the Potsdam, Vaihingen, and Inria datasets demonstrate state-of-the-art performance. Notably, with only 1/32 of the labeled data on the Potsdam dataset, the CSSP reaches 82.16% mIoU, nearly matching the fully supervised result (82.24%). Furthermore, we extend the CSSP to a semi-supervised domain adaptation (SSDA) scenario, termed Cross-Domain CSSP (CDCSSP), which outperforms existing SSDA and unsupervised domain adaptation (UDA) methods. This work establishes a stable and highly effective framework for training ViT-based segmentation models with minimal annotation overhead.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper