Abstract Background Acute Respiratory Failure (ARF) requiring respiratory support in an Intensive Care Unit (ICU) is one of the most common causes of critical illness in the United States. Despite well-established best practices from randomized clinical trials, adherence to optimal respiratory strategies remains inconsistent. This is partly due to the complexity of ARF, where patient physiology and ventilator settings evolve non-linearly and interdependently, requiring clinicians to make hundreds of time-sensitive decisions. Reinforcement learning (RL), which optimizes sequential decision-making toward short- and long-term outcomes, offers a promising framework to support this process. Methods We trained a double deep Q-network (DQN) RL agent to recommend ventilator settings for adult patients who received invasive mechanical ventilation for ≥4 hours using the Common Longitudinal Intensive Care Unit Format (CLIF) version of the MIMIC-IV dataset. All variables were discretized into 1-hour bins, aligning with the agent’s decision interval. The state space included each patient’s most recent vitals, labs, ventilator settings, and continuous medications. The action space comprised a 2 × 2 grid defined by ventilator mode (controlled vs. uncontrolled) and oxygenation support (high vs. low PEEP/FiO2). The reward function captured changes in arterial pH, PF ratio, and survival. We externally validated the RL agent’s decisions across three independent CLIF-standardized ICU datasets by assessing whether higher physician-agent concordance was associated with lower odds of mortality, adjusting for age and baseline illness severity. Results In external validation using the University of Chicago dataset, each additional 10% of hours in which clinician decisions aligned with the RL agent’s recommendations was associated with a 3.2% decrease in confounder-adjusted odds of mortality (95% CI, 0.9%-5.5%; p = 0.006). Similar patterns were observed in external validations using the Rush Medical Center and Northwestern datasets, showing 8.4% (95% CI, 5.7%-11.0%; p 0.001) and 4.8% (95% CI, 2.6%-7.0%; p 0.001) decreases, respectively (Fig 1). The RL agent more frequently selected uncontrolled ventilation modes, reserving controlled modes for patients receiving higher doses of sedation or exhibiting severe gas-exchange impairment (Fig 1B). Example physician actions and RL agent actions are presented in Fig 1C. Conclusions RL-guided personalized respiratory support was associated with improved outcomes for patients with ARF. Future research should focus on more granular action spaces incorporating additional clinical features and rigorous prospective evaluation. This abstract is funded by: NIH
Parker et al. (Fri,) studied this question.