Retinopathy of prematurity (ROP) is a leading cause of childhood blindness. Chronic hypoxia at high altitude may create a unique risk profile and potentially influence the onset and progression of ROP. However, predictive tools specifically designed for preterm infants in high-altitude settings remain limited. This study aimed to develop and validate machine learning (ML) -based models for individualized ROP risk prediction in this population. We retrospectively collected clinical data from 2, 138 preterm infants who underwent standardized ROP screening at Qinghai Red Cross Hospital between May 2014 and May 2025. The cohort was divided into a training set (n = 1, 496) and an independent testing set (n = 642) in a 7: 3 ratio. The prediction timepoint (t0) was defined as immediately before the first scheduled ROP fundus screening. Candidate predictors were initially screened using univariate analysis, followed by temporal plausibility assessment and LASSO logistic regression (λₘin), resulting in 24 predictors for subsequent modeling. Nine ML algorithms, logistic regression, decision tree, random forest, XGBoost, LightGBM, support vector machine (SVM), Gaussian Naive Bayes (GaussianNB), multilayer perceptron (MLP), and TabNet, as well as a soft-voting ensemble (Random Forest + XGBoost + LightGBM) were trained using five-fold cross-validation and tuned via Bayesian optimization. Model performance was then evaluated on the independent testing set. In the independent testing set, the random forest model achieved the highest discrimination (AUC = 0. 850, 95% CI: 0. 795–0. 906; PR-AUC = 0. 607). XGBoost and the soft-voting ensemble demonstrated similar AUCs (both 0. 845), with XGBoost providing the best overall classification balance (F1-score = 0. 622; MCC = 0. 573). ML models based on routinely available pre-screening clinical variables can effectively stratify ROP risk in preterm infants at high altitude. Anchoring predictions to the timepoint immediately before the first scheduled ROP examination enhances clinical interpretability and temporal validity, potentially enabling more efficient allocation of screening resources in this setting.
Yu et al. (Wed,) studied this question.