What question did this study set out to answer?

The study aims to develop and validate machine learning models to predict in-hospital mortality risk in ICU patients with COPD.

May 14, 2026Open Access

Interpretable machine learning for predicting in-hospital mortality in COPD ICU patients: a rigorous validation across time and geography

Key Points

The study aims to develop and validate machine learning models to predict in-hospital mortality risk in ICU patients with COPD.
Utilized multicenter databases (MIMIC-IV, MIMIC-III, and eICU) for model development and validation.
Employed 12 machine learning algorithms, including CatBoost, with evaluation based on AUC, calibration curves, and decision curve analysis.
Incorporated core predictors through feature selection methods and addressed class imbalance using cost-sensitive learning.
CatBoost achieved an AUC of 0.753 (95% CI: 0.722–0.784) in MIMIC-IV for internal validation.
Significant improvements over SAPS II score were observed: ΔAUC = +0.063 (P < 0.001) in MIMIC-IV.
SHAP analysis highlighted SAPS II, respiratory rate, and blood urea nitrogen as key predictors.

Abstract

Chronic obstructive pulmonary disease (COPD) is a common reason for admission to the intensive care unit (ICU), where accurate risk stratification is crucial for clinical decision-making. This study aimed to develop and validate machine learning models for predicting in-hospital mortality risk in ICU patients with COPD using multicenter critical care databases, and to evaluate their incremental value and clinical utility. This was a multicenter retrospective study utilizing data from three public databases: MIMIC-IV (for model development and internal validation), MIMIC-III (for internal temporal validation), and eICU (for external validation). Patients with a first ICU admission, aged ≥ 18 years, and meeting ICD diagnosis codes for COPD were included; those with an ICU length of stay < 24 h were excluded. The primary outcome was in-hospital mortality. Core predictors were selected through collinearity analysis, the Boruta algorithm, and recursive feature elimination with tenfold nested cross-validation. Twelve algorithms were employed for model development, and cost-sensitive learning was applied to address class imbalance in the training set (MIMIC-IV). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration curves (calibration intercept and slope), Brier score, and decision curve analysis (DCA). The DeLong test was used to compare AUC between models, and the Integrated Discrimination Improvement (IDI) quantified the incremental value of the model relative to the SAPS II score. The SHAP method was used for model interpretation. A total of 7,900 patients from the MIMIC-IV cohort, 1,979 from MIMIC-III, and 8,491 from the eICU cohort were included. Thirteen core predictors were ultimately selected (e.g., SAPS II, respiratory rate, heart rate, blood urea nitrogen, lactate). The CatBoost model demonstrated the best robustness across the three independent validation sets, achieving an AUC of 0.753 (95% CI: 0.722–0.784) in internal validation, 0.731 (95% CI: 0.701–0.760) in temporal validation, and 0.735 (95% CI: 0.718–0.751) in external validation. After calibration, the model’s predictive accuracy improved significantly in both MIMIC-III (Brier score decreased from 0.16 to 0.14) and eICU (Brier score decreased from 0.11 to 0.09). DCA indicated a clinical net benefit for the model within the 0.10–0.60 risk threshold range. Compared to the SAPS II score, CatBoost significantly improved discrimination and reclassification in the MIMIC-IV (ΔAUC = + 0.063, P < 0.001; IDI = 0.066, P < 0.001) and MIMIC-III (ΔAUC = + 0.044, P < 0.001; IDI = 0.058, P < 0.001) cohorts. SHAP analysis identified SAPS II, respiratory rate, and blood urea nitrogen as key drivers of risk prediction. An online risk calculator based on this model has been publicly deployed. This study successfully developed a CatBoost model for predicting in-hospital mortality in ICU patients with COPD using multicenter data. The model demonstrated good discrimination, calibration, and clinical utility across cross-institutional and cross-temporal validation, with performance superior to the traditional SAPS II score. The online tool, integrated with SHAP explanations, can provide clinicians with individualized risk prediction support.

Mark Helpful

Bookmark

Relay

View Full Paper