To address the problem of low recognition rate caused by the difficulty in capturing highspeed and subtle movements in table tennis, this work proposes a motion recognition method based on multimodal data and an optimized Spatial-Temporal Graph Convolutional Network (ST-GCN). The model introduces a Multi-Level Graph Convolutional Network (ML-GCN) architecture and constructs cross-level feature extraction channels, which effectively capture the spatiotemporal correlations between local subtle movements and global trajectories. The built-in hybrid attention mechanism realizes precise focusing on key skeletal nodes and core motion frames through adaptive weight assignment. Combined with the multimodal fusion strategy of visual signals and inertial sensor data, it significantly enhances the robustness of the model in scenarios with line-of-sight occlusion and motion blur. Test results based on a self-built multimodal table tennis dataset show that this method achieves an accuracy of 88.2%, a recall rate of 89.5% and an F1-score of 88.3%. This performance is significantly superior to the original ST-GCN and existing mainstream motion recognition algorithms, which confirms the core role of each optimization module in improving feature representation capability and computational efficiency. The study provides an efficient technical solution for the intelligent analysis of complex sports movements.
Lu et al. (Fri,) studied this question.