What question did this study set out to answer?

The research aims to improve hyperspectral unmixing accuracy and efficiency using a spatial-spectral guided approach.

March 15, 2026Open Access

SSTNT: A Spatial–Spectral Similarity Guided Transformer-in-Transformer for Hyperspectral Unmixing

Key Points

The research aims to improve hyperspectral unmixing accuracy and efficiency using a spatial-spectral guided approach.
Developed a Spatial–Spectral Similarity Guided Transformer-in-Transformer (SSTNT) framework.
Utilized linear self-attention to extract local pixel features within sliding windows.
Employed global attention for contextual information aggregation.
The proposed method demonstrated improved robustness in hyperspectral unmixing tasks.
Extensive experiments validated the architecture's effectiveness on synthetic and real hyperspectral datasets.

Abstract

Vision Transformers (ViTs), owing to their strong capability in modeling global contextual dependencies, have been widely adopted in hyperspectral image unmixing (HU). However, standard ViTs process images by partitioning them into non-overlapping patches, which disrupts spatial continuity at the pixel level and neglects the fine-grained structural relationships among pixels within local regions. Consequently, effectively capturing the detailed spatial–spectral features required for accurate unmixing remains challenging. Furthermore, the high computational complexity of global self-attention and its sensitivity to noise limit the applicability of conventional Transformers to HU. To address these issues, we propose a spatial–spectral similarity guided Transformer-in-Transformer (SSTNT) framework. The proposed network adopts a modified TNT architecture, in which the inner Transformer employs a linear self-attention (LSA) mechanism to efficiently exploit pixel-level local features within sliding windows, while the outer Transformer preserves global attention to aggregate contextual information, thereby forming a cooperative local–global optimization scheme. Furthermore, a lightweight spatial–spectral similarity module is introduced to enhance the modeling of neighborhood structures. Finally, spectral reconstruction is achieved through a trainable endmember decoder and a normalized abundance estimation module. Extensive experiments conducted on both synthetic and real hyperspectral datasets demonstrate the effectiveness and robustness of the proposed method.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Cui et al. (Fri,) studied this question.

synapsesocial.com/papers/69b606ea83145bc643d1d70c https://doi.org/https://doi.org/10.3390/photonics13030276

Bookmark

View Full Paper