What type of study is this?

September 10, 2025Open Access

A Spatiotemporal Bidirectional Mamba Network with Global–Local Skeletal Enhancement for 3D Human Pose Estimation

Key Points

BSTMamba achieves state-of-the-art accuracy in 3D human pose estimation with reduced computational demands.
The model effectively integrates global sequence modeling with localized convolutions, enhancing pose representation.
DisruptEnhance introduces a novel perturbation mechanism for improving robustness in human motion modeling.
Evaluation on datasets like Human3.6M shows significant performance improvements over prior methods.

Abstract

Abstract 3D human pose estimation (HPE) is a cornerstone task in computer vision with diverse applications, where lifting 2D pose sequences to 3D representations has attracted significant interest. Transformer-based approaches have demonstrated robust performance but are hampered by quadratic computational complexity and insufficient bidirectional modeling capabilities. The recently introduced Mamba model mitigates these limitations through state-space models (SSMs) offering linear complexity and effective long-range dependencies; however, it falls short in modeling local skeletal interactions essential for human motion.To address this, we present BSTMamba, a bidirectional spatiotemporal SSM architecture designed specifically for monocular 3D HPE. BSTMamba integrates efficient global sequence modeling with localized convolutions and dynamic gating mechanisms to capture intricate spatiotemporal dependencies. For enhanced robustness and generalization, we introduce DisruptEnhance, a residual-compensated joint-order perturbation module that randomly disrupts joint orders at both global (full-skeleton) and local (body-part) scales, followed by feature compensation via a lightweight residual subnet. Comprehensive evaluations on the Human3.6M and MPI-INF-3DHP datasets reveal that BSTMamba attains state-of-the-art accuracy while requiring fewer parameters and lower multiply-accumulate operations (MACs) compared to prior methods.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Chuhan Wu

University of Technology Sydney

Zan Wang

Hebei Medical University

Gengze Zhou

Australian Institute of Business

Actions

Institutions

Henan University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Spatiotemporal Bidirectional Mamba Network with Global–Local Skeletal Enhancement for 3D Human Pose Estimation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study