Introduction: Precision critical care requires large and representative data. We developed a multi-center, multimodal, high-resolution critical care dataset including electronic health record (EHR), imaging, and waveform data. We report the composition of the initial 50,000 patient admissions and lessons learned. Methods: We included a random sample of adults, pediatric and neonatal intensive care unit (ICU) admissions. Data partitions specified 60% training, 20% test, and 20% hold-out sets to enhance robustness and facilitate regulatory compliance (e.g. FDA filings). The CHoRUS Trusted Research Environment (TRE) and AI/ML workspace were developed on cloud to promote collaborative computing and privacy. EHR data included thousands of lab, med, nurse flowsheet data, diagnoses, and notes, mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, extended for custom ICU concepts and linkages to imaging (DICOM) and waveforms (WFDB). OHDSI tools (ATLAS) enabled comparing variations in ICU practice patterns. Results: As of 7/1/25, the CHoRUS dataset accrued 50,637 ICU admissions: 1.6 billion rows of OMOP EHR data, 23 Tb of waveforms, and initial radiology data for 7,642 patients, representing ICD-10 diagnoses for acute kidney injury (9,491 patients), sepsis (8,880), shock (5,221), trauma (3,068), ARDS (951), acute MI (2,264), pulmonary embolism (1,677), subarachnoid hemorrhage (1,637), subdural hemorrhage (2,661), and intracranial hemorrhage (1,053). No site exceeded 18% of the cohort. Race and ethnicity varied by hospital: 35% Black and 30% Hispanic. Age distribution was bimodal, peaking for neonates and ages 60-70 years. Death was recorded in 21.4%. The CHoRUS TRE had onboarded 226 active user accounts including an NIH AI training program (AIM-AHEAD) and 3 datathons, yielding iterative data quality improvement cycles. The TRE hosts AI/ML tools for data labeling and model development/validation. Conclusions: The Bridge2AI CHoRUS critical care dataset offers multimodal, multi speciality, high-resolution data in a trusted research environment to support generalizable clinical care AI and comparative effectiveness research. Details are available at www.github.com/chorus-ai.
Rosenthal et al. (Sun,) studied this question.