What question did this study set out to answer?

The aim is to characterize foundation models pretrained on structured EHR data and assess application trends and methodology.

March 16, 2026

Systematic review of foundation models for structured electronic health records

Key Points

The aim is to characterize foundation models pretrained on structured EHR data and assess application trends and methodology.
Conducted a systematic search in MEDLINE and Embase for relevant studies from 2018 to October 2025.
Focused on models pretrained on structured EHR data utilizing self-supervised learning.
Performed data abstraction and study selection in duplicate.
Summarized characteristics of identified studies and stratified findings by median publication year.
Fifty-three studies were included, showing an increase in publications over time.
79% of datasets were from the United States, with none focusing on pediatric groups.
Shift in model architecture towards transformers, with significant changes in context windows.
Application has moved from embedding-based to generative or mixed usage.
Only 26% of studies assessed transferability to external datasets, and none provided clinical deployment details.

Abstract

Abstract Purpose Foundation models pretrained on structured electronic health record (EHR) data promise improved predictive performance, sample efficiency and resilience to distribution shifts. However, model design, scale and use remain unclear. Objectives were to characterize foundation models pretrained on structured EHR data; examine temporal trends in model application and scale, architecture and design; and assess the extent to which publications omitted methodological details. Methods We searched MEDLINE and Embase (2018-October 2025) for foundation models pretrained on structured EHR data using self-supervised learning and applied to clinical prediction tasks. Study selection and data abstraction were performed in duplicate. Characteristics were summarized and stratified by median publication year. Results Fifty-three studies were included; publications increased over time. Most datasets (79%) originated from the United States. None pretrained exclusively on pediatric cohorts. Model architecture shifted towards transformers (P = .013) with longer context windows (P = .028), while application shifted from exclusively embedding-based toward generative or mixed use (P .001). Choices regarding feature inclusion, temporal representation, self-supervised objective and downstream adaptation remained heterogeneous. Only 26% of studies evaluated transfer to external datasets, and none described clinical deployment. Key indicators of scale and compute were frequently unreported. Conclusions EHR foundation models are proliferating and increasingly transformer-based and generative. Yet methodological choices and reporting remain fragmented, indicating design trade-offs and best practices for EHR foundation models have not yet been established. None describe clinical deployment. Future work should clarify which design choices improve performance, robustness and transferability, increase reporting transparency and identify if they can be implemented to improve patient-important outcomes.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lin Lawrence Guo

Santiago Eduardo Arciniegas

Adam P. Yan

University of Toronto

Journals

Journal of the American Medical Informatics Association

Actions

Institutions

Stanford University

Hospital for Sick Children

University Health Network

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Systematic review of foundation models for structured electronic health records

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study