Masked auto-encoders (MAEs) have been extensively employed in the field of semi-supervised hyperspectral image classification (HSIC). However, the developed models encounter significant challenges in learning separable representations, as they do not sufficiently prioritize the reconstruction of the central pixel, which hinders their ability to learn separable representations. To address this limitation, we propose spectral–spatial MAE with central pixel reconstruction (SSMAE-CR), a novel self-supervised framework tailored for HSIC. To capture more comprehensive representations, SSMAE-CR employs a dual-branch architecture comprising a spectral MAE with central pixel reconstruction (SpecMAE-CR) and a spatial MAE with central pixel reconstruction (SpatMAE-CR). SpecMAE-CR highlights the significance of central pixel reconstruction by measuring the deviation between the central pixels of reconstructed and original samples. To preserve the holism of the learned latent representations, SpatMAE-CR maps the central pixels of the reconstructed samples back to their original counterparts through the introduction of an additional linear layer. Rigorous comparative experiments conducted on four publicly available datasets fully demonstrate that SSMAE-CR outperforms state-of-the-art methods. Furthermore, we validate the effectiveness of SSMAE-CR by evaluating the mean intra-class and inter-class distances of the learned representations. Experimental results demonstrate that prioritizing central pixel reconstruction yields a statistically significant increase in the mean inter-class distance, suggesting enhanced class separability in the representation space.
Wang et al. (Thu,) studied this question.