What does this research mean for the field?

A random forest machine learning model utilizing nine clinical predictors can accurately predict early acute respiratory distress syndrome (ARDS) in critically ill patients with acute pancreatitis. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

May 28, 2026Open Access

An interpretable machine learning model for predicting acute respiratory distress syndrome in critically ill patients with acute pancreatitis: A multicenter retrospective study

Key Points

This research aims to develop and validate an interpretable machine learning model for predicting acute respiratory distress syndrome (ARDS) in patients with severe acute pancreatitis.
Utilized retrospective data from the MIMIC-IV database and Changshu Hospital for model development and external validation.
Employed a hybrid feature selection strategy using LASSO regression and the Boruta algorithm to identify key predictors.
Constructed seven machine learning algorithms and evaluated performance using AUC, calibration curves, and clinical utility (DCA).
The random forest model showed the best performance with an internal AUC of 0.851 and an external AUC of 0.823.
Nine independent predictors were identified: BMI, respiratory rate, temperature, SOFA score, white blood cell count, PO2, PCO2, mechanical ventilation, and antibiotic use.
Calibration curves demonstrated good agreement between predicted and observed probabilities, while DCA indicated superior net benefit across various clinical thresholds.

Abstract

Objective Acute respiratory distress syndrome (ARDS) drives early mortality in severe acute pancreatitis (AP). Since conventional tools often fail to capture complex physiological interactions, we aimed to develop and validate an interpretable machine learning (ML) model for early ARDS prediction and deploy it as a web-based calculator. Methods This multicenter retrospective study utilized data from the MIMIC-IV database for model development and internal validation, and an independent cohort from Changshu Hospital for external validation. Optimal predictors were identified through a hybrid feature selection strategy combining LASSO regression and the Boruta algorithm. Seven ML algorithms were constructed, including random forest (RF), extreme gradient boosting, support vector machine, logistic regression, light gradient boosting machine, k-nearest neighbors, and decision trees. Model performance was evaluated by discrimination (AUC), calibration curves, and clinical utility (DCA). Model interpretability was assessed using SHapley Additive exPlanations (SHAP) and partial dependence plots (PDP). Results A total of 905 patients from the MIMIC-IV cohort (25.0% ARDS incidence) and 126 from the external cohort (20.6% incidence) were included. Nine independent predictors were identified: body mass index (BMI), respiratory rate, temperature, SOFA score, white blood cell count, PO 2 , PCO 2 , mechanical ventilation, and antibiotic use. The RF model demonstrated best performance (internal AUC 0.851) and maintained robust generalization in the external cohort (AUC 0.823). Calibration curves indicated good agreement between predicted and observed probabilities, and DCA showed superior net benefit across clinically relevant thresholds. SHAP analysis identified ventilation, SOFA score, BMI, PO 2 , and respiratory rate as the most influential predictors. Conclusion A high-performing, interpretable RF model was developed for early ARDS prediction in critically ill AP patients. The model effectively captured complex physiological interactions and demonstrated robustness across diverse populations. By integrating this algorithmic framework into a user-friendly web calculator, the tool supports personalized risk stratification and timely clinical decision-making.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper