A deep recurrent neural network model for predicting transfer to ICU or death showed very poor performance (AUPRC 0.042; 95% CI 0.04-0.043), comparable to logistic regression and early warning scores.
Cohort (n=146,446)
Yes
Does a deep recurrent neural network model improve the real-time prediction of clinical deterioration (transfer to ICU or death) in inpatient adults compared to standard early warning scores?
Commonly used early warning scores and deep learning models show very poor performance for real-time prediction of clinical deterioration in ward patients when assessed using simulated prospective validation.
Effect estimate: AUPRC 0.042 (95% CI 0.04-0.043)
OBJECTIVES: The National Early Warning Score, Modified Early Warning Score, and quick Sepsis-related Organ Failure Assessment can predict clinical deterioration. These scores exhibit only moderate performance and are often evaluated using aggregated measures over time. A simulated prospective validation strategy that assesses multiple predictions per patient-day would provide the best pragmatic evaluation. We developed a deep recurrent neural network deterioration model and conducted a simulated prospective evaluation. DESIGN: Retrospective cohort study. SETTING: Four hospitals in Pennsylvania. PATIENTS: Inpatient adults discharged between July 1, 2017, and June 30, 2019. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We trained a deep recurrent neural network and logistic regression model using data from electronic health records to predict hourly the 24-hour composite outcome of transfer to ICU or death. We analyzed 146,446 hospitalizations with 16.75 million patient-hours. The hourly event rate was 1.6% (12,842 transfers or deaths, corresponding to 260,295 patient-hours within the predictive horizon). On a hold-out dataset, the deep recurrent neural network achieved an area under the precision-recall curve of 0.042 (95% CI, 0.04-0.043), comparable with logistic regression model (0.043; 95% CI 0.041 to 0.045), and outperformed National Early Warning Score (0.034; 95% CI, 0.032-0.035), Modified Early Warning Score (0.028; 95% CI, 0.027- 0.03), and quick Sepsis-related Organ Failure Assessment (0.021; 95% CI, 0.021-0.022). For a fixed sensitivity of 50%, the deep recurrent neural network achieved a positive predictive value of 3.4% (95% CI, 3.4-3.5) and outperformed logistic regression model (3.1%; 95% CI 3.1-3.2), National Early Warning Score (2.0%; 95% CI, 2.0-2.0), Modified Early Warning Score (1.5%; 95% CI, 1.5-1.5), and quick Sepsis-related Organ Failure Assessment (1.5%; 95% CI, 1.5-1.5). CONCLUSIONS: Commonly used early warning scores for clinical decompensation, along with a logistic regression model and a deep recurrent neural network model, show very poor performance characteristics when assessed using a simulated prospective validation. None of these models may be suitable for real-time deployment.
Shah et al. (Fri,) conducted a cohort in Clinical deterioration (n=146,446). Deep recurrent neural network deterioration model vs. Logistic regression model, NEWS, MEWS, and qSOFA was evaluated on 24-hour composite outcome of transfer to ICU or death (AUPRC 0.042, 95% CI 0.04-0.043). A deep recurrent neural network model for predicting transfer to ICU or death showed very poor performance (AUPRC 0.042; 95% CI 0.04-0.043), comparable to logistic regression and early warning scores.