What question did this study set out to answer?

The aim is to enhance human action recognition by improving spatiotemporal modeling techniques.

June 4, 2026Open Access

MADS-GCN: A Robust Interactive Memory-Augmented Dual-Stream GCN with Adaptive Spatiotemporal Modeling for Human Action Recognition

Key Points

The aim is to enhance human action recognition by improving spatiotemporal modeling techniques.
Developed MADS-GCN combining Physical Stream and Topological Stream for spatial modeling.
Implemented channel-temporal attention mechanism with bidirectional GRU for temporal modeling.
Conducted experiments on NTU RGB+D60, Northwestern-UCLA, and DanceBasic-Set datasets.
MADS-GCN showed significant improvements in action recognition accuracy across datasets tested.
The framework effectively captures both global structural patterns and local adaptive features.
Bidirectional GRU facilitated the understanding of multi-scale temporal patterns, enhancing performance.

Abstract

Human action recognition is a key research area in computer vision, where accurate recognition relies on effective modeling of both global and local spatiotemporal information. However, existing GCN-based methods often overemphasize the local topological connectivity of human skeletons. Moreover, their temporal modules fail to fully capture the evolution of action sequences, leading to critical instantaneous information being obscured by global representations. To address these problems, we propose an integrated framework termed MADS-GCN. In the spatial modeling stage, we introduce two parallel streams: the Physical Stream uses the adjacency matrix to constrain convolution and capture global structural patterns, while the Topological Stream leverages spatial attention to assign adaptive weights to joints, preserving discriminative local adaptive features. For temporal modeling, a channel-temporal attention mechanism is applied to adaptively refine feature maps, followed by a bidirectional GRU to capture multi-scale temporal patterns. Extensive experiments on NTU RGB+D60, Northwestern-UCLA, and our custom DanceBasic-Set demonstrate the effectiveness of MADS-GCN and indicate its applicability to dance action recognition scenarios.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Wang et al. (Thu,) studied this question.

synapsesocial.com/papers/6a2115f6d499ed480b16f005 https://doi.org/https://doi.org/10.3390/app16115408

Bookmark

View Full Paper