What question did this study set out to answer?

The aim is to improve the detection of semantic changes in remote sensing images by addressing semantic ambiguities and noise.

March 25, 2026Open Access

Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images

Key Points

The aim is to improve the detection of semantic changes in remote sensing images by addressing semantic ambiguities and noise.
Developed a lightweight framework called LGDENet.
Implemented language-guided temporal contrastive learning to align visual and textual features.
Introduced a Difference Enhancement Module to reduce noise before feature fusion.
Utilized depthwise separable convolutions for adaptive isolation of irrelevant variations.
Achieved a semantic F1 score of 87.90% on the SECOND dataset and 88.71% on the Landsat-SCD dataset.
Demonstrated state-of-the-art performance with a parameter count of 33.45 M.
Showed improved accuracy and efficiency compared to existing heavy foundation models.

Abstract

Semantic change detection (SCD) in remote sensing images aims not only to localize changed regions but also to identify their specific “from–to” semantic transitions. This task remains challenging due to the inherent semantic ambiguity of spectral changes and the presence of pseudo-change noise. While recent vision–language models have shown promise in remote sensing, existing approaches like RemoteCLIP predominantly focus on static scene classification, lacking the ability to explicitly model dynamic temporal transitions. Other adaptations of foundation models (e.g., AdaptVFMs-RSCD) often rely on heavy backbones, incurring prohibitive computational costs. To address these limitations, this paper proposes LGDENet, a lightweight, end-to-end framework that unifies Language-Guided Temporal Contrastive Learning with a noise-robust difference enhancement mechanism. Specifically, we construct a temporal transition prompt learning strategy that aligns visual difference features with textual descriptions of dynamic processes, thereby resolving directional semantic ambiguities. Furthermore, we introduce a Difference Enhancement Module (DEM) that leverages the channel–spatial decoupling property of depthwise separable convolutions to adaptively isolate and suppress irrelevant variations (e.g., registration errors) before feature fusion. Experiments on the SECOND and Landsat-SCD datasets demonstrate that LGDENet achieves state-of-the-art performance, yielding a semantic F1 score (Fscd) of 87.90% and 88.71%, respectively. Moreover, with a modest parameter count of 33.45 M, it offers a superior trade-off between accuracy and efficiency compared to heavy foundation model-based approaches.

Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images

Key Points

Abstract

Cite This Study

Also Consider

Also Consider