Abstract Backgrounds Shock is one of leading cause of mortality in the intensive care unit (ICU). Previously, artificial intelligence-guided sepsis management protocols were shown, with a few critical limitations. We hypothesized more delicate state definition and real-world reward function design and machine-in-the-loop, multi-agent reinforcement learning (RL) approach would improve overall shock treatments, closely reflecting clinicians' thought processes. Materials and Methods With multigranular ICU database Medical Information Mart for Intensive Care (MIMIC-IV, ver 3.0), shock patients (defined as mean arterial pressure (MAP) 65mmHg more than 30 minutes, or presence of ongoing vasopressor) with all causes were selected. All tabular as well as numerical physiologic vital flowsheets were pre-processed to 1-hour segment as input data. Pre-shock (defined as 65 ≤ MAP 75) and gap-shock (defined as an interval between two shock periods, regardless of MAP) were annotated. State definition was created with a combination of physiologic trends of vital parameters and markers of tissue perfusion and organ dysfunction and generated with supervised contrastive learning process. Action items were discretized direction of crystalloid and vasopressor, along with binary use of red blood cell transfusion. Reward structure contains a combination of short-term (minimizing shock time), intermediate-term (minimizing organ failure), and long-term (minimizing mortality) goal elements - which were customized with individual agent. For the RL model, QR-DQN (Distributional Reinforcement Learning with Quantile Regression) was employed, where high-risk actions could be better tolerated with increase adaptability to various clinical scenarios, while restricting wildly random actions against pre-existing clinical guidelines. A machine-in-the-loop alert system was devised for conditional execution of RL-based treatment (Figure 1). Results 27,438 shock (for all cause) patients were identified with average 137.9 hours (median 73 hours) of ICU stay. A total of 10 actions were derived from crystalloid, vasopressor, and transfusion. The alert system was activated 29.6% of the time during shock, 18.1% during gap-shock, and 3.7% during stable period, across different types of shock (septic, cardiogenic, or hypovolemic/hemorrhagic). The system intervention reduced the length of shock by 71%. The composite Q-value (sum of potential benefit) from the individual agent expressed through the system exceed those identifiable Q-value of clinicians' management by 88.6%, suggesting potentially remarkable improvement in clinical outcome with this approach. Conclusions The use of multi-agent deep RL model, along with customized action threshold and composite reward structure for each agent as well as alert system enabled a dependable and potentially superior clinical decision support for shock treatment. This abstract is funded by: NIH R35GM159939 (National Institutes of Health); RS-2024-00439677 (Korea Health Industry Development Institute)
Yoon et al. (Fri,) studied this question.