Random forest models achieved an accuracy of 85.6% and AUC of 90.5% for predicting injury risk in male football players based on training and recovery data.
Can machine learning models accurately predict sports-related injury risk in male professional football players using daily workload and recovery data?
Ensemble machine learning methods, particularly random forests, can accurately predict sports-related injury risk in professional football players, enabling individualized risk assessment and prevention strategies.
Absolute Event Rate: 0% vs 0%
Abstract Accurate prediction of sports-related injuries is essential for optimizing athlete health and performance. This study evaluated machine learning (ML) models for injury risk in 300 male professional football players (ages 18–28) monitored over two competitive seasons (2021–2022). Injuries were defined as musculoskeletal conditions causing at least one missed training session or match, confirmed via ICD-10 diagnoses. Daily data on training workload, recovery, wellness, heart-rate variability, cumulative minutes played, and injury history were collected. Features were preprocessed with normalization, one-hot encoding, and selected via LASSO regression and recursive feature elimination. Missing data (< 3%) were imputed using multiple imputation by chained equations, and class imbalance was addressed with SMOTE and weighting. Logistic regression, decision tree, and random forest models were trained using 10-fold cross-validation and evaluated for accuracy, precision, recall, F1-score, and AUC. Random forests outperformed other models, achieving accuracy 85.6 ± 2.1%, precision 82.1 ± 1.9%, recall 80.3 ± 2.4%, F1-score 81.2 ± 2.2%, and AUC 90.5 ± 1.6%. Explainable AI techniques, including SHAP and LIME, identified prior injury, training intensity, and recovery time as the strongest predictors, enabling individualized risk assessment. These findings demonstrate that ensemble ML methods provide robust, interpretable, and actionable insights for injury prevention, supporting data-driven strategies to optimize training and reduce injury incidence. Future work should expand validation across multiple sports and integrate additional physiological and genetic factors to enhance predictive accuracy and generalizability.
Xu et al. (Thu,) reported a other. Random forest models achieved an accuracy of 85.6% and AUC of 90.5% for predicting injury risk in male football players based on training and recovery data.