What question did this study set out to answer?

The research aims to evaluate the performance and computational efficiency of LLaMA models compared to BERT in clinical information extraction tasks.

January 17, 2026

Information extraction from clinical notes: are we ready to switch to large language models?

Key Points

The research aims to evaluate the performance and computational efficiency of LLaMA models compared to BERT in clinical information extraction tasks.
Developed a comprehensive annotated corpus of 1588 clinical notes from various data sources.
Benchmarking LLaMA-2 and LLaMA-3 models against BERT for named entity recognition and relation extraction.
Assessment of performance across diverse datasets including data-rich and data-limited conditions.
LLaMA models outperformed BERT in clinical information extraction tasks across datasets.
In data-rich settings, LLaMA showed marginal improvements of 1% for named entity recognition and up to 3.7% for relation extraction.
LLaMA-3-70B demonstrated over 7% improvement in F1 scores for named entity recognition under limited data conditions.

Abstract

Abstract Objectives To assess the performance, generalizability, and computational efficiency of instruction-tuned Large Language Model Meta AI (LLaMA)-2 and LLaMA-3 models compared to bidirectional encoder representations from transformers (BERT) for clinical information extraction (IE) tasks, specifically named entity recognition (NER) and relation extraction (RE). Materials and Methods We developed a comprehensive annotated corpus of 1588 clinical notes from 4 data sources—UT Physicians (UTP) (1342 notes), Transcribed Medical Transcription Sample Reports and Examples (MTSamples) (146), Medical Information Mart for Intensive Care (MIMIC)-III (50), and Informatics for Integrating Biology and the Bedside (i2b2) (50), capturing 4 clinical entities (problems, tests, medications, other treatments) and 16 modifiers (eg, negation, certainty). Large Language Model Meta AI-2 and LLaMA-3 were instruction-tuned for clinical NER and RE, and their performance was benchmarked against BERT. Results Large Language Model Meta AI models consistently outperformed BERT across datasets. In data-rich settings (eg, UTP), LLaMA achieved marginal gains (approximately 1% improvement for NER and 1.5%-3.7% for RE). Under limited data conditions (eg, MTSamples, MIMIC-III) and on the unseen i2b2 dataset, LLaMA-3-70B improved F1 scores by over 7% for NER and 4% for RE. However, performance gains came with increased computational costs, with LLaMA models requiring more memory and Graphics Processing Unit (GPU) hours and running up to 28 times slower than BERT. Discussion While LLaMA models offer enhanced performance, their higher computational demands and slower throughput highlight the need to balance performance with practical resource constraints. Application-specific considerations are essential when choosing between LLMs and BERT for clinical IE. Conclusion Instruction-tuned LLaMA models show promise for clinical NER and RE tasks. However, the tradeoff between improved performance and increased computational cost must be carefully evaluated. We release our Kiwi package (https://kiwi.clinicalnlp.org/) to facilitate the application of both LLaMA and BERT models in clinical IE applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yan Hu

The University of Texas Health Science Center at Houston

Xu Zuo

The University of Texas Health Science Center at Houston

Yujia Zhou

Zhejiang University

Journals

Journal of the American Medical Informatics Association

Actions

Institutions

Yale University

The University of Texas Health Science Center at Houston

Vanderbilt University Medical Center

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Information extraction from clinical notes: are we ready to switch to large language models?

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study