Only 10 of 59 available machine learning prediction models in orthopedic surgery have been externally validated, and the 18 available validation studies demonstrated incomplete reporting of performance measures with a median TRIPOD completeness of 61%.
Systematic Review (n=18)
What is the availability and reporting quality of external validations of machine-learning prediction models for orthopedic surgical outcomes?
Most predictive ML models in orthopedics lack external validation, and available validation studies suffer from incomplete reporting, limiting their clinical implementation.
Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.
Groot et al. (Sun,) conducted a systematic review in Orthopedic surgical outcomes (n=18). Machine learning prediction models was evaluated on Proportion, performance, and transparent reporting (TRIPOD guidelines) of externally validated ML prediction models. Only 10 of 59 available machine learning prediction models in orthopedic surgery have been externally validated, and the 18 available validation studies demonstrated incomplete reporting of performance measures with a median TRIPOD completeness of 61%.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: