Abstract Hyperspectral image classification has become a pivotal task in remote sensing data processing. However, current deep learning-based methods still face challenges in effectively extracting discriminative joint spatial-spectral features, especially when handling complex spatial structures and highly variable spectral responses. To address this issue, a novel hyperspectral image classification network (MSNet) based on multiscale spatial-spectral fusion (MSSF) and semantic enhancement encoder (SEE) is proposed. First, principal component analysis is employed to reduce the dimensionality of hyperspectral image, preserving essential spectral information while mitigating noise interference. Second, the proposed MSSF extracts multiscale spatial features through a dedicated spatial branch while simultaneously mining discriminative spectral information via a parallel spectral branch. Feature fusion is subsequently employed to achieve synergistic interaction between spatial and spectral representations, effectively capturing joint characteristics of land covers across multiple scales. Third, the designed SEE incorporates a multi-head attention mechanism to explicitly model global feature dependencies, thereby enhancing the representation of semantically critical regions. Extensive experiments have been conducted on the Pavia University and Salinas datasets. Experimental results demonstrate that MSNet achieves outstanding performance, with overall accuracy reaching 95.68% and 96.84%, respectively, surpassing existing mainstream methods and validating its effectiveness.
Yu et al. (Sat,) studied this question.