Abstract To systematically review Machine Learning (ML) fracture risk prediction models developed solely using administrative data, evaluating their development, performance, risk of bias, and concerns regarding applicability. A systematic search was conducted in PubMed, Embase, IEEE, and Web of Science following PRISMA guidelines (up to November 2025). We included studies developing or validating ML models for osteoporotic fracture risk in adults using administrative data without clinical measurements. Risk of bias and concerns for applicability were assessed using the PROBAST tool. Seven studies were included from 3435 initial records. A range of of ML models were utilized, including Random Forests, XGBoost, LASSO-regularized Logistic Regression and Neural Networks. Discriminative performance was moderate-to-good, with best Area Under the Curve (AUC) values ranging from 0.818 for osteoporotic fractures in general to 0.905 for hip fractures only. While five studies showed low risk of bias, five raised applicability concerns due to using specific subpopulations or features with database characteristics. Only two studies used external validation and none directly evaluated clinical utility of the models. ML models using solely administrative data show some promise as scalable, automated tools for fracture risk prediction without requiring manual clinical input. However, clinical implementation is hindered by limited external validation and a lack of formal utility evaluation of the models. Future research should prioritize more rigorous external validation and recalibration of the existing models, more so than the development of novel models, ensuring they are robust, interpretable, and integrable into future clinical workflows.
Hansen et al. (Fri,) studied this question.