Skeleton-based action recognition networks have widely adopted the approach of Graph Convolutional Networks (GCN) due to their superior capabilities in modeling data topology, but several key issues still require further investigation. Firstly, the graph convolutional network extracts action features by applying temporal convolution to each key point, which causes the model to ignore the temporal connections between different important points. Secondly, the local receptive field of graph convolutional networks limits their ability to capture correlations between non-adjacent joints. Motivated by the State Space Model (SSM), we propose an Action Spatio-temporal Aggregation Network, named ActionMamba. Specifically, we introduce a novel embedding module called the Action Characteristic Encoder (ACE), which enhances the coupling of temporal and spatial information in skeletal features by combining intrinsic spatio-temporal encoding with extrinsic space encoding. Additionally, we design an Action Perception Model (APM) based on Mamba and GCN. By effectively combining the excellent feature processing capabilities of GCN with the outstanding global information modeling capabilities of Mamba, APM is able to comprehend the hidden features between different joints and selectively filter information from various joints. Extensive experimental results demonstrate that ActionMamba achieves highly competitive performance on three challenging benchmark datasets: NTU-RGB+D 60, NTU-RGB+D 120, and UAV–Human.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jinglong Wen
Dan Liu
Bin Zheng
Electronics
North University of China
Building similarity graph...
Analyzing shared references across papers
Loading...
Wen et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68d44a4031b076d99fa539fb — DOI: https://doi.org/10.3390/electronics14183610