What does this research mean for the field?

The proposed multimodal deep learning model based on Transformer improves individualized sports health assessment accuracy to 95.4% and maintains stability under noise conditions. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to enhance sports health assessments by addressing individual differences and integrating multiple data sources.

February 26, 2026

Transformer for Individualized Sports Motion Analysis and Health-Related Activity Recognition Based on Multimodal Deep Learning Method

Key Points

The aim is to enhance sports health assessments by addressing individual differences and integrating multiple data sources.
Developed a multimodal deep learning model using a Transformer architecture.
Integrated wearable sensor time-series data, visual motion, and self-reported text data.
Utilized a lightweight AlignNet adapter for feature alignment.
Employed a dynamic multi-task loss function to optimize training.
Achieved an activity recognition accuracy of 95.4% on the UCI-HAR dataset.
F1-score of 95.1%, indicating high precision and recall.
Maintained 85.3% accuracy even with noise at a level of 0.4.
Single inference time was measured at 22.7ms.

Abstract

This study aims to further address the limitations of traditional sports health assessment methods, such as significant individual differences, single-source data, and lack of predictive capability. It proposes a multimodal deep learning model based on Transformer. The model combines the cross-modal attention distillation method with the Vision Transformer-Bidirectional Long Short-Term Memory (ViT-BiLSTM) hybrid architecture. On the one hand, this model integrates wearable sensor time-series data, visual motion data, and text self-reported information. On the other hand, it incorporates a lightweight Atomistic Line Graph Network (AlignNet) adapter to achieve feature alignment and a dynamic multi-task loss function to optimize the model training process. Experimental results on the University of California Irvine-Human Activity Recognition (UCI-HAR) dataset show that the model achieves an activity recognition accuracy of 95.4% and an F1-score of 95.1%. It maintains an accuracy of 85.3% even at a noise level of 0.4, with a single inference time of 22.7ms. The study indicates that the proposed method can effectively integrate multi-source information, improve the accuracy and stability of individualized sports health assessment while ensure real-time performance, and expand the application of deep learning in the field of smart healthcare.

Bookmark

Transformer for Individualized Sports Motion Analysis and Health-Related Activity Recognition Based on Multimodal Deep Learning Method

Key Points

Abstract

Cite This Study

Also Consider

Also Consider