This study aims to further address the limitations of traditional sports health assessment methods, such as significant individual differences, single-source data, and lack of predictive capability. It proposes a multimodal deep learning model based on Transformer. The model combines the cross-modal attention distillation method with the Vision Transformer-Bidirectional Long Short-Term Memory (ViT-BiLSTM) hybrid architecture. On the one hand, this model integrates wearable sensor time-series data, visual motion data, and text self-reported information. On the other hand, it incorporates a lightweight Atomistic Line Graph Network (AlignNet) adapter to achieve feature alignment and a dynamic multi-task loss function to optimize the model training process. Experimental results on the University of California Irvine-Human Activity Recognition (UCI-HAR) dataset show that the model achieves an activity recognition accuracy of 95.4% and an F1-score of 95.1%. It maintains an accuracy of 85.3% even at a noise level of 0.4, with a single inference time of 22.7ms. The study indicates that the proposed method can effectively integrate multi-source information, improve the accuracy and stability of individualized sports health assessment while ensure real-time performance, and expand the application of deep learning in the field of smart healthcare.
Qingwei Luan (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: