Neonatal morbidity and mortality in the first month of life remain a major global healthchallenge. This study introduces NeoRisk, a machine learning framework that predicts dailyneonatal risk levels (“Healthy” or “At Risk”) using longitudinal monitoring data from 100newborns over 30 days including vital signs, growth metrics, feeding patterns, and jaundicelevels.Initial tabular models (Logistic Regression, Random Forest, XGBoost) produced nearperfect results (ROC AUC ≈ 1.000), but detailed investigation revealed severe data leakage,primarily from jaundice values that were indirectly embedded in the risk labels. Aftersystematically removing these leaking features, realistic performance emerged in the 0.85–0.94ROC AUC range. To capture temporal physiological patterns, a complementary LSTM timeseries model was developed using 7-day historical sequences avoiding static leakage whilemodeling dynamic changes. Key predictors shifted from jaundice (pre-leakage) to weight changeand gestational age (post-leakage). NeoRisk offers a reproducible, clinically meaningful pipelinefor early risk stratification and demonstrates the value of leakage detection, class imbalancehandling, and longitudinal deep learning in neonatal care.
Agha Wafa Abbas (Fri,) studied this question.