Current skeleton-based action recognition methods face two critical bottlenecks: rigid anatomical graph topologies that cannot capture action-specific joint coordination, and local temporal receptive fields that miss long-range motion dependencies. We address these limitations through a hybrid architecture that synergistically combines data-driven adaptive graph learning with global Transformer attention. Our key innovation lies in embedded Gaussian adaptive graphs that discover non-physical joint relationships (e. g. , contralateral limb coordination) through learned similarity functions, complemented by a Transformer encoder-decoder with learnable action queries that selectively extract discriminative temporal phases. Extensive experiments on NTU-RGB+D 60 demonstrate 91. 5% cross-subject accuracy, achieving statistically significant improvements (p < 0. 001) over state-of-the-art GCN methods (CTR-GCN: +0. 7%, MS-G3D: +2. 1%) and Transformer baselines (ST-TR: +2. 8%). Ablation studies quantify synergistic effects: adaptive graphs contribute +5. 8%, Transformers contribute +5. 3%, and their combination yields additional +2. 5–3. 0% gains, validating complementary spatial-temporal modeling. Cross-dataset evaluation on NTU-RGB+D 120 (87. 3% accuracy) confirms generalization capability. With 4. 7M parameters and 230 ms/sample processing time, the model is suitable for offline biomechanics analysis and motion understanding applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
JunZhang Chen
Xuchang University
Yue Guanli
Xuchang University
Scientific Reports
Xuchang University
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Wed,) studied this question.
synapsesocial.com/papers/69eb0cb2553a5433e34b5ae8 — DOI: https://doi.org/10.1038/s41598-026-49915-z