Abstract Purpose Foundation models pretrained on structured electronic health record (EHR) data promise improved predictive performance, sample efficiency and resilience to distribution shifts. However, model design, scale and use remain unclear. Objectives were to characterize foundation models pretrained on structured EHR data; examine temporal trends in model application and scale, architecture and design; and assess the extent to which publications omitted methodological details. Methods We searched MEDLINE and Embase (2018-October 2025) for foundation models pretrained on structured EHR data using self-supervised learning and applied to clinical prediction tasks. Study selection and data abstraction were performed in duplicate. Characteristics were summarized and stratified by median publication year. Results Fifty-three studies were included; publications increased over time. Most datasets (79%) originated from the United States. None pretrained exclusively on pediatric cohorts. Model architecture shifted towards transformers (P = .013) with longer context windows (P = .028), while application shifted from exclusively embedding-based toward generative or mixed use (P .001). Choices regarding feature inclusion, temporal representation, self-supervised objective and downstream adaptation remained heterogeneous. Only 26% of studies evaluated transfer to external datasets, and none described clinical deployment. Key indicators of scale and compute were frequently unreported. Conclusions EHR foundation models are proliferating and increasingly transformer-based and generative. Yet methodological choices and reporting remain fragmented, indicating design trade-offs and best practices for EHR foundation models have not yet been established. None describe clinical deployment. Future work should clarify which design choices improve performance, robustness and transferability, increase reporting transparency and identify if they can be implemented to improve patient-important outcomes.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lin Lawrence Guo
Santiago Eduardo Arciniegas
Adam P. Yan
University of Toronto
Journal of the American Medical Informatics Association
Stanford University
Hospital for Sick Children
University Health Network
Building similarity graph...
Analyzing shared references across papers
Loading...
Guo et al. (Fri,) studied this question.
synapsesocial.com/papers/69b79e638166e15b153aba90 — DOI: https://doi.org/10.1093/jamia/ocag033