What question did this study set out to answer?

This study aims to create machine learning models to predict the risk of retinopathy of prematurity in preterm infants at high altitudes.

April 10, 2026Open Access

Machine learning-based individualized prediction: risk assessment of retinopathy in preterm infants at high altitude

Puntos clave

This study aims to create machine learning models to predict the risk of retinopathy of prematurity in preterm infants at high altitudes.
Retrospective analysis of clinical data from 2,138 preterm infants
Cohort split into training (n=1,496) and testing sets (n=642)
Use of LASSO logistic regression for predictor selection
Training of nine machine learning algorithms and a soft-voting ensemble
Model evaluation on the independent testing set
Random forest model achieved the highest AUC of 0.850 indicating strong predictive power
XGBoost and soft-voting ensemble also showed high AUCs of 0.845
XGBoost provided the best overall classification balance with an F1-score of 0.622
Machine learning models effectively stratified ROP risk based on clinical variables

Resumen

Retinopathy of prematurity (ROP) is a leading cause of childhood blindness. Chronic hypoxia at high altitude may create a unique risk profile and potentially influence the onset and progression of ROP. However, predictive tools specifically designed for preterm infants in high-altitude settings remain limited. This study aimed to develop and validate machine learning (ML) -based models for individualized ROP risk prediction in this population. We retrospectively collected clinical data from 2, 138 preterm infants who underwent standardized ROP screening at Qinghai Red Cross Hospital between May 2014 and May 2025. The cohort was divided into a training set (n = 1, 496) and an independent testing set (n = 642) in a 7: 3 ratio. The prediction timepoint (t0) was defined as immediately before the first scheduled ROP fundus screening. Candidate predictors were initially screened using univariate analysis, followed by temporal plausibility assessment and LASSO logistic regression (λₘin), resulting in 24 predictors for subsequent modeling. Nine ML algorithms, logistic regression, decision tree, random forest, XGBoost, LightGBM, support vector machine (SVM), Gaussian Naive Bayes (GaussianNB), multilayer perceptron (MLP), and TabNet, as well as a soft-voting ensemble (Random Forest + XGBoost + LightGBM) were trained using five-fold cross-validation and tuned via Bayesian optimization. Model performance was then evaluated on the independent testing set. In the independent testing set, the random forest model achieved the highest discrimination (AUC = 0. 850, 95% CI: 0. 795–0. 906; PR-AUC = 0. 607). XGBoost and the soft-voting ensemble demonstrated similar AUCs (both 0. 845), with XGBoost providing the best overall classification balance (F1-score = 0. 622; MCC = 0. 573). ML models based on routinely available pre-screening clinical variables can effectively stratify ROP risk in preterm infants at high altitude. Anchoring predictions to the timepoint immediately before the first scheduled ROP examination enhances clinical interpretability and temporal validity, potentially enabling more efficient allocation of screening resources in this setting.

Me gusta

Guardar

Ver artículo completo