What is the clinical evidence from this study?

Study design: Review. Population: Sports-related injuries (n=38). Intervention: Machine learning models vs. Logistic regression. Primary outcome: Predictive performance (AUC) for injury risk.

November 29, 2024Open Access

Machine learning approaches to injury risk prediction in sport: a scoping review with evidence synthesis

Q: What does this research mean for the field?

Tree-based machine learning models provide the highest predictive performance for sports injury risk, though their clinical utility is currently limited by methodological heterogeneity and small datasets. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

Key Result

Tree-based machine learning models provided the highest predictive performance for sports injury risk in 60% of reviewed studies, though clinical utility remains limited by methodological heterogeneity.

Structured PICO

What is the efficacy of machine learning models for predicting sports-related injuries?

Population

38 studies investigating sports-related injuries (football/soccer most common)

Intervention

Machine learning models (e.g., Random Forest, XGBoost, tree-based solutions)

Comparator

Logistic regression or other machine learning methods

Outcome

Predictive performance (Area under the curve [AUC])surrogate

While machine learning models like Random Forest and XGBoost show strong statistical performance for sports injury prediction, their clinical utility is currently limited by methodological heterogeneity and small datasets.

Limitations

Wide prediction windows
Broad definitions of injury
Small datasets
Methodological heterogeneities including cohort sizes and dependent variables

Abstract

OBJECTIVE: This study reviewed the current state of machine learning (ML) research for the prediction of sports-related injuries. It aimed to chart the various approaches used and assess their efficacy, considering factors such as data heterogeneity, model specificity and contextual factors when developing predictive models. DESIGN: Scoping review. DATA SOURCES: PubMed, EMBASE, SportDiscus and IEEEXplore. RESULTS: In total, 1241 studies were identified, 58 full texts were screened, and 38 relevant studies were reviewed and charted. Football (soccer) was the most commonly investigated sport. Area under the curve (AUC) was the most common means of model evaluation; it was reported in 71% of studies. In 60% of studies, tree-based solutions provided the highest statistical predictive performance. Random Forest and Extreme Gradient Boosting (XGBoost) were found to provide the highest performance for injury risk prediction. Logistic regression outperformed ML methods in 4 out of 12 studies. Three studies reported model performance of AUC>0.9, yet the clinical relevance is questionable. CONCLUSIONS: A variety of different ML models have been applied to the prediction of sports-related injuries. While several studies report strong predictive performance, their clinical utility can be limited, with wide prediction windows or broad definitions of injury. The efficacy of ML is hampered by small datasets and numerous methodological heterogeneities (cohort sizes, definition of injury and dependent variables), which were common across the reviewed studies.

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper