Random forest models achieved an accuracy of 85.6% and AUC of 90.5% for predicting injury risk in male football players based on training and recovery data.
Can machine learning models accurately predict sports-related injury risk in male professional football players using daily workload and recovery data?
300 male professional football players (ages 18–28)
Machine learning (ML) models (logistic regression, decision tree, and random forest) trained on daily data including training workload, recovery, wellness, heart-rate variability, cumulative minutes played, and injury history
Prediction of sports-related injuries (defined as musculoskeletal conditions causing at least one missed training session or match, confirmed via ICD-10 diagnoses) evaluated by accuracy, precision, recall, F1-score, and AUC
Ensemble machine learning methods, particularly random forests, can accurately predict sports-related injury risk in professional football players, enabling individualized risk assessment and prevention strategies.
Absolute Event Rate: 0% vs 0%
Abstract Accurate prediction of sports-related injuries is essential for optimizing athlete health and performance. This study evaluated machine learning (ML) models for injury risk in 300 male professional football players (ages 18–28) monitored over two competitive seasons (2021–2022). Injuries were defined as musculoskeletal conditions causing at least one missed training session or match, confirmed via ICD-10 diagnoses. Daily data on training workload, recovery, wellness, heart-rate variability, cumulative minutes played, and injury history were collected. Features were preprocessed with normalization, one-hot encoding, and selected via LASSO regression and recursive feature elimination. Missing data (< 3%) were imputed using multiple imputation by chained equations, and class imbalance was addressed with SMOTE and weighting. Logistic regression, decision tree, and random forest models were trained using 10-fold cross-validation and evaluated for accuracy, precision, recall, F1-score, and AUC. Random forests outperformed other models, achieving accuracy 85.6 ± 2.1%, precision 82.1 ± 1.9%, recall 80.3 ± 2.4%, F1-score 81.2 ± 2.2%, and AUC 90.5 ± 1.6%. Explainable AI techniques, including SHAP and LIME, identified prior injury, training intensity, and recovery time as the strongest predictors, enabling individualized risk assessment. These findings demonstrate that ensemble ML methods provide robust, interpretable, and actionable insights for injury prevention, supporting data-driven strategies to optimize training and reduce injury incidence. Future work should expand validation across multiple sports and integrate additional physiological and genetic factors to enhance predictive accuracy and generalizability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhenhua Xu
Nanjing University of Chinese Medicine
WeiYa Sun
Silpakorn University
Haonan Qian
Anshan Normal University
BMC Medical Informatics and Decision Making
Hanyang University
Nanjing University of Chinese Medicine
Silpakorn University
Building similarity graph...
Analyzing shared references across papers
Loading...
Xu et al. (Thu,) reported a other. Random forest models achieved an accuracy of 85.6% and AUC of 90.5% for predicting injury risk in male football players based on training and recovery data.
synapsesocial.com/papers/6963222391e05aa366cb8a7e — DOI: https://doi.org/10.1186/s12911-025-03331-x
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: