Los puntos clave no están disponibles para este artículo en este momento.
Masked Image Modeling (MIM) has made significant advancements across various fields in recent years. Previous research in the hyperspectral (HS) domain often utilizes conventional Transformers to model spectral sequences, overlooking the impact of local details on HS image classification. Furthermore, training models using raw image features as reconstruction targets entails significant challenges. In this study, we specifically focus on the reconstruction targets and feature modeling capabilities of the Vision Transformer (ViT) to address the limitations of MIM methods in the HS domain. As a proposed solution, we introduce a novel and effective method called LFSMIM, which incorporates two key strategies: (1) filtering out high-frequency components from the reconstruction target to mitigate the network’s sensitivity to noise, and (2) enhancing the local and global modeling capabilities of the ViT to effectively capture weakened texture details and exploit global spectral features. LFSMIM demonstrated superior performance in overall accuracy compared to other methods on the Indian Pines, Pavia University, and Houston 2013 datasets, achieving accuracies of 95.522%, 98.820%, and 98.160% respectively. The code will be made available at https://github.com/yuweikong/LFSMIM.
Chen et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: