What question did this study set out to answer?

This research aims to train and validate a large language model to predict 90-day mortality in stroke survivors using discharge notes.

May 8, 2026Open Access

Abstract Number: Esoc2026a1955 Training and Validation of a Medical Large Language Model (Medgemma) to Predict 90-Day Mortality of Stroke Survivors From Discharge Notes

Key Points

This research aims to train and validate a large language model to predict 90-day mortality in stroke survivors using discharge notes.
Developed MedGemma, a domain-specific large language model, using discharge notes from MIMIC-IV database (n=5198).
External validation of the model was carried out with patient records from Columbia University Medical Center (n=2704).
Evaluated model performance using area under the curve (AUC) metrics for accuracy.
Mean AUCs for ischemic stroke were 0.74, ICH 0.77, and SAH 0.79 in the MIMIC-IV cross-validation.
AUCs in Columbia University validation were 0.83 for ischemic stroke, 0.90 for ICH, and 0.88 for SAH, with corresponding mortality rates for each.

Abstract

Abstract Background and aims Free-text clinical notes contain rich prognostic information often lost in models relying solely on structured variables. Additionally, structured data, such as NIHSS, are frequently incomplete or labor-intensive to extract, whereas free-text is universally available. Large language models (LLMs) can leverage narrative unstructured text without manual feature extraction. We trained and validated a domain-specific medical LLM (MedGemma) to predict 90-day mortality from discharge notes of stroke survivors. Methods We trained and externally validated MedGemma-4B to predict 2–90-day post-discharge mortality in stroke survivors using hospital discharge notes. Separate models were trained for (1) ischemic stroke, (2) spontaneous intracerebral hemorrhage (ICH), and (3) nontraumatic subarachnoid hemorrhage (SAH). For training, we used the Medical Information Mart for Intensive Care (MIMIC-IV) public database from Beth Israel Deaconess Medical Center (Boston, MA; 2008–2019). For external validation, we used stroke patient records from Columbia University Medical Center (New York, NY; 2020–2024). Receiver operating characteristics area under the curve (AUC) was used to evaluate model accuracy. Results In MIMIC-IV dataset (n=5198) cross-validation, mean AUCs and mortality rates were 0.74±0.08 for ischemic stroke (452/2332, 19.4%), 0.77±0.04 for ICH (344/2079, 16.5%), and 0.79±0.05 for nontraumatic SAH (61/787, 7.8%), respectively. In Columbia University external validation (n=2704), AUCs and mortality rates were 0.83 for ischemic stroke (162/1963, 8.3%), 0.90 for ICH (82/546, 15%), and 0.88 for nontraumatic SAH (24/195, 12.8%), respectively. Conclusions LLMs can provide accurate stroke risk-stratification from narrative clinical notes, overcoming the limitations of conventional prognostic scores that rely on labor-intensive (and sometimes incomplete) structured variable extraction. Conflict of interest Anh T. Tran, PhD:nothing to disclose; Joshua Z. Willey, MD:nothing to disclose; Santosh B. Murthy, MD:nothing to disclose; Guido J. Falcone, MD:nothing to disclose; Lee H. Schwamm, MD:nothing to disclose; Kevin N. Sheth, MD:nothing to disclose; Seyedmehdi Payabvash, MD:nothing to disclose

Abstract Number: Esoc2026a1955 Training and Validation of a Medical Large Language Model (Medgemma) to Predict 90-Day Mortality of Stroke Survivors From Discharge Notes

Key Points

Abstract

Cite This Study