What is the clinical evidence from this study?

Study design: Observational. Population: Pediatric critical illness (n=14237). Intervention: Random forest machine learning model vs. Pediatric Logistic Organ Dysfunction-2 (PELOD-2) score. Primary outcome: PICU mortality prediction (AUC) (95% CI 0.863-0.895, p=0.003).

May 1, 2021Open Access

A Machine Learning Classifier Improves Mortality Prediction Compared With Pediatric Logistic Organ Dysfunction-2 Score: Model Development and Validation

Key Result

A random forest machine learning model significantly outperformed the Pediatric Logistic Organ Dysfunction-2 score in predicting PICU mortality, achieving an AUC of 0.867 compared to 0.761.

Study Design

Type

Observational (n=14,237)

Multicenter

Structured PICO

Does a machine learning model improve prediction of PICU mortality compared to the Pediatric Logistic Organ Dysfunction-2 (PELOD-2) score in critically ill pediatric patients?

Population

14,237 pediatric patients admitted to a quaternary care medical-surgical PICU from 2013 to 2019 (10,194 in training cohort, 4,043 in validation cohort), excluding patients whose primary reason for admission relates to congenital or acquired pediatric heart disease.

Intervention

Machine learning algorithms (specifically a random forest model, PELODRF) using the same variables as the Pediatric Logistic Organ Dysfunction-2 (PELOD-2) score to predict PICU mortality.

Comparator

Pediatric Logistic Organ Dysfunction-2 (PELOD-2) score and a locally retrained PELOD-2 logistic regression model.

Outcome

PICU mortality during the index PICU admission.hard clinical

A random forest machine learning model using standard clinical variables provides superior discrimination and calibration for predicting PICU mortality compared to the traditional logistic regression-based PELOD-2 score.

Main Result

Absolute Event Rate: 0.867% vs 0.761%

p-value: p=0.003

Limitations

Frequent missing values typical of ICU databases
Imputation method assumed missing values were normal
Limited to predictors included in the PELOD-2 model
Single-center study requiring external validation

Abstract

OBJECTIVES: To determine whether machine learning algorithms can better predict PICU mortality than the Pediatric Logistic Organ Dysfunction-2 score. DESIGN: Retrospective study. SETTING: Quaternary care medical-surgical PICU. PATIENTS: All patients admitted to the PICU from 2013 to 2019. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We investigated the performance of various machine learning algorithms using the same variables used to calculate the Pediatric Logistic Organ Dysfunction-2 score to predict PICU mortality. We used 10,194 patient records from 2013 to 2017 for training and 4,043 patient records from 2018 to 2019 as a holdout validation cohort. Mortality rate was 3.0% in the training cohort and 3.4% in the validation cohort. The best performing algorithm was a random forest model (area under the receiver operating characteristic curve, 0.867 95% CI, 0.863-0.895; area under the precision-recall curve, 0.327 95% CI, 0.246-0.414; F1, 0.396 95% CI, 0.321-0.468) and significantly outperformed the Pediatric Logistic Organ Dysfunction-2 score (area under the receiver operating characteristic curve, 0.761 95% CI, 0.713-0.810; area under the precision-recall curve (0.239 95% CI, 0.165-0.316; F1, 0.284 95% CI, 0.209-0.360), although this difference was reduced after retraining the Pediatric Logistic Organ Dysfunction-2 logistic regression model at the study institution. The random forest model also showed better calibration than the Pediatric Logistic Organ Dysfunction-2 score, and calibration of the random forest model remained superior to the retrained Pediatric Logistic Organ Dysfunction-2 model. CONCLUSIONS: A machine learning model achieved better performance than a logistic regression-based score for predicting ICU mortality. Better estimation of mortality risk can improve our ability to adjust for severity of illness in future studies, although external validation is required before this method can be widely deployed.

Bookmark

View Full Paper