Hyperspectral image classification (HSIC) plays a crucial role in fine-grained Earth observation tasks. However, balancing efficient long-range dependency modeling with the extraction of fine-grained local features remains a significant challenge, primarily due to the inherent high-dimensional spectral redundancy and complex spatial variability of hyperspectral data. Existing modeling paradigms exhibit distinct limitations: Convolutional Neural Networks (CNNs) are constrained by localized receptive fields, while Vision Transformers (ViTs), despite their global receptive capabilities, incur prohibitive quadratic computational complexity. Meanwhile, the emerging Mamba architecture has demonstrated remarkable effectiveness in sequence modeling with linear complexity, but it often lacks sufficient sensitivity to local textures when directly applied to non-causal 2D images. To address these limitations, this paper proposes a novel parallel hybrid architecture termed DualMambaFormer. Deviating from the traditional serial stacking paradigm, the proposed network utilizes a dual-stream design to achieve the complementary fusion of global static attention and dynamic sequence reasoning. Specifically, the model first employs an SS-ResNet for spectral dimensionality reduction and local feature embedding. Subsequently, the architecture bifurcates into a parallel encoding stage: one branch leverages Multi-Head Self-Attention (MHSA) to capture global spatial correlations, while the other introduces a Local Enhanced Mamba (LEM) branch. By integrating State Space Models (SSM) with depthwise separable convolutions, the LEM branch simultaneously captures long-range causal dependencies and local spatial context. Finally, a dual class token fusion strategy is designed to integrate heterogeneous representations at the decision level. Extensive experiments on four benchmark datasets—Indian Pines, Pavia University, Salinas, and WHU-HongHu—show that DualMambaFormer achieves OA values of 96.56%, 98.95%, 97.60%, and 96.09%, respectively, with consistently high AA and Kappa coefficients. These results demonstrate the effectiveness, robustness, and generalization capability of the proposed method for hyperspectral image classification. Compared with the second-best competing methods, DualMambaFormer improves OA by 5.55, 2.30, 1.68, and 4.30 percentage points on the Pavia University, Indian Pines, Salinas, and WHU-HongHu datasets, respectively.
Jiang et al. (Mon,) studied this question.