What question did this study set out to answer?

This research aims to develop a machine learning model that effectively predicts neonatal health risks based on monitoring data.

February 2, 2026Open Access

Predictive Analytics for Neonatal Health Risk Assessment: A Machine Learning Approach to Early Detection of Critical Health Conditions in Newborns

Key Points

This research aims to develop a machine learning model that effectively predicts neonatal health risks based on monitoring data.
Developed NeoRisk machine learning framework to assess neonatal risk levels.
Analyzed longitudinal data from 100 newborns over 30 days, tracking vital signs, growth metrics, and feeding patterns.
Utilized various models including Logistic Regression, Random Forest, and XGBoost, then applied LSTM for time-series analysis to capture physiological changes.
Initial tabular models achieved near-perfect ROC AUC of approximately 1.000, but uncovering data leakage reduced realistic performance to 0.85–0.94.
Key predictors shifted from jaundice to weight change and gestational age after removing leaking features.
Demonstrated effective handling of data leakage and class imbalance within the neonatal health context.

Abstract

Neonatal morbidity and mortality in the first month of life remain a major global healthchallenge. This study introduces NeoRisk, a machine learning framework that predicts dailyneonatal risk levels (“Healthy” or “At Risk”) using longitudinal monitoring data from 100newborns over 30 days including vital signs, growth metrics, feeding patterns, and jaundicelevels.Initial tabular models (Logistic Regression, Random Forest, XGBoost) produced nearperfect results (ROC AUC ≈ 1.000), but detailed investigation revealed severe data leakage,primarily from jaundice values that were indirectly embedded in the risk labels. Aftersystematically removing these leaking features, realistic performance emerged in the 0.85–0.94ROC AUC range. To capture temporal physiological patterns, a complementary LSTM timeseries model was developed using 7-day historical sequences avoiding static leakage whilemodeling dynamic changes. Key predictors shifted from jaundice (pre-leakage) to weight changeand gestational age (post-leakage). NeoRisk offers a reproducible, clinically meaningful pipelinefor early risk stratification and demonstrates the value of leakage detection, class imbalancehandling, and longitudinal deep learning in neonatal care.

Predictive Analytics for Neonatal Health Risk Assessment: A Machine Learning Approach to Early Detection of Critical Health Conditions in Newborns

Key Points

Abstract

Cite This Study