Unsupervised change detection (CD) in heterogeneous remote sensing images is intrinsically difficult due to severe sensor-specific discrepancies. In the absence of ground truth, these discrepancies result in ambiguous optimization objectives that make it difficult for models to distinguish true land-cover changes from modality-driven pseudo-changes. To address these challenges, we propose MaskUCD, a novel unsupervised framework that reformulates heterogeneous CD as a dynamic mask-driven constraint scheduling problem. Fundamentally distinct from conventional strategies that enforce selective feature alignment, MaskUCD employs a spatially adaptive optimization mechanism. Specifically, the iteratively refined mask serves as a geometric reference to guide optimization. It enforces strict feature alignment in mask-unchanged regions to suppress modality-induced discrepancies, while simultaneously promoting feature divergence in mask-changed regions to emphasize semantic inconsistencies. In this way, explicit optimization objectives are established, together with an intrinsic interpretability constraint that guides the CD process. This strategy treats the mask as a structural guide for representation learning rather than a ground-truth reference, thereby avoiding error accumulation caused by directly using inaccurate masks as supervisory signals. To facilitate this optimization, we design a specialized asymmetric autoencoder with a hybrid encoder architecture, utilizing multi-scale frequency analysis and global context modeling to enhance feature representation capabilities. Consequently, this design enables the generation of refined and semantically consistent masks, which provide increasingly precise structural guidance, yielding converged and discriminative difference maps. Extensive experiments demonstrate that MaskUCD achieves state-of-the-art performance and superior robustness compared to existing advanced methods.
Xie et al. (Sun,) studied this question.