This study aims to address the challenges in sports skill training posed by limited labeled data, sensitivity to environmental interference, and weak adaptability to individual differences in motion posture recognition. It proposes a spatiotemporal graph convolutional recognition framework enhanced by multimodal data augmentation. At the data level, a Spatio-Temporal Hybrid Augmentation (STHA) strategy is designed. The study integrates a skeleton sequence synthesis method based on Generative Adversarial Network (GAN) with geometric-photometric joint transformations. Adversarial samples are generated using a displacement matrix of key skeletal joint points. Additionally, Random Frame Sampling (RFS) and Spatio-Temporal Elastic Deformation (STED) mechanisms are introduced to enhance data diversity. At the model level, an improved Spatio-Temporal Graph Convolutional Network (ST-GCN) is constructed. A Deformable Graph Convolution Kernel (DGCK) is employed to dynamically adapt to the topological relationships between human joints. Furthermore, an Attention-guided Temporal Aggregation Module (ATAM) is embedded to capture temporal features effectively. Experimental results show that the proposed method achieves 89.2% mean Average Precision (mAP) under the cross-subject protocol on the NTU-RGB+D 120 dataset, outperforming the best baseline by 5.5 percentage points. In a few-shot setting with only 10% training data, the hybrid augmentation framework improves recognition accuracy by 12.7%. In practical sports applications, the system detects diving twist angles with an error margin within ±3.1°, and its gymnastics scoring correlates with a Pearson coefficient of 0.91. The real-time processing speed reaches 23.7 fps, meeting the demands of training feedback. This study overcomes key barriers in marker less motion capture and real-time feedback for athletic training, offering quantifiable, intelligent algorithmic support for the scientific advancement of sports practice.
Zhang et al. (Wed,) studied this question.