Current skeleton-based action recognition methods face two critical bottlenecks: rigid anatomical graph topologies that cannot capture action-specific joint coordination, and local temporal receptive fields that miss long-range motion dependencies. We address these limitations through a hybrid architecture that synergistically combines data-driven adaptive graph learning with global Transformer attention. Our key innovation lies in embedded Gaussian adaptive graphs that discover non-physical joint relationships (e. g. , contralateral limb coordination) through learned similarity functions, complemented by a Transformer encoder-decoder with learnable action queries that selectively extract discriminative temporal phases. Extensive experiments on NTU-RGB+D 60 demonstrate 91. 5% cross-subject accuracy, achieving statistically significant improvements (p < 0. 001) over state-of-the-art GCN methods (CTR-GCN: +0. 7%, MS-G3D: +2. 1%) and Transformer baselines (ST-TR: +2. 8%). Ablation studies quantify synergistic effects: adaptive graphs contribute +5. 8%, Transformers contribute +5. 3%, and their combination yields additional +2. 5–3. 0% gains, validating complementary spatial-temporal modeling. Cross-dataset evaluation on NTU-RGB+D 120 (87. 3% accuracy) confirms generalization capability. With 4. 7M parameters and 230 ms/sample processing time, the model is suitable for offline biomechanics analysis and motion understanding applications.
Chen et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: