What question did this study set out to answer?

This research aims to enhance hyperspectral image classification by addressing limitations of existing models.

May 13, 2026Open Access

DualMambaFormer: A Parallel Hybrid Transformer–Mamba Network for Hyperspectral Image Classification

Key Points

This research aims to enhance hyperspectral image classification by addressing limitations of existing models.
Proposes DualMambaFormer, a hybrid architecture with parallel encoding branches.
Uses SS-ResNet for spectral dimensionality reduction and local feature embedding.
Integrates local enhanced Mamba branch with state space models and depthwise separable convolutions.
Achieves overall accuracy (OA) of 96.56% on Pavia University, 98.95% on Indian Pines, 97.60% on Salinas, and 96.09% on WHU-HongHu datasets.
Demonstrates improvements in OA by 5.55, 2.30, 1.68, and 4.30 percentage points compared to second-best methods.
Shows high average accuracy (AA) and Kappa coefficients across all datasets.

Abstract

Hyperspectral image classification (HSIC) plays a crucial role in fine-grained Earth observation tasks. However, balancing efficient long-range dependency modeling with the extraction of fine-grained local features remains a significant challenge, primarily due to the inherent high-dimensional spectral redundancy and complex spatial variability of hyperspectral data. Existing modeling paradigms exhibit distinct limitations: Convolutional Neural Networks (CNNs) are constrained by localized receptive fields, while Vision Transformers (ViTs), despite their global receptive capabilities, incur prohibitive quadratic computational complexity. Meanwhile, the emerging Mamba architecture has demonstrated remarkable effectiveness in sequence modeling with linear complexity, but it often lacks sufficient sensitivity to local textures when directly applied to non-causal 2D images. To address these limitations, this paper proposes a novel parallel hybrid architecture termed DualMambaFormer. Deviating from the traditional serial stacking paradigm, the proposed network utilizes a dual-stream design to achieve the complementary fusion of global static attention and dynamic sequence reasoning. Specifically, the model first employs an SS-ResNet for spectral dimensionality reduction and local feature embedding. Subsequently, the architecture bifurcates into a parallel encoding stage: one branch leverages Multi-Head Self-Attention (MHSA) to capture global spatial correlations, while the other introduces a Local Enhanced Mamba (LEM) branch. By integrating State Space Models (SSM) with depthwise separable convolutions, the LEM branch simultaneously captures long-range causal dependencies and local spatial context. Finally, a dual class token fusion strategy is designed to integrate heterogeneous representations at the decision level. Extensive experiments on four benchmark datasets—Indian Pines, Pavia University, Salinas, and WHU-HongHu—show that DualMambaFormer achieves OA values of 96.56%, 98.95%, 97.60%, and 96.09%, respectively, with consistently high AA and Kappa coefficients. These results demonstrate the effectiveness, robustness, and generalization capability of the proposed method for hyperspectral image classification. Compared with the second-best competing methods, DualMambaFormer improves OA by 5.55, 2.30, 1.68, and 4.30 percentage points on the Pavia University, Indian Pines, Salinas, and WHU-HongHu datasets, respectively.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Jiang et al. (Mon,) studied this question.

synapsesocial.com/papers/6a04158679e20c90b44453c2 https://doi.org/https://doi.org/10.3390/rs18101516

Bookmark

View Full Paper