Abstract Objectives To assess the performance, generalizability, and computational efficiency of instruction-tuned Large Language Model Meta AI (LLaMA)-2 and LLaMA-3 models compared to bidirectional encoder representations from transformers (BERT) for clinical information extraction (IE) tasks, specifically named entity recognition (NER) and relation extraction (RE). Materials and Methods We developed a comprehensive annotated corpus of 1588 clinical notes from 4 data sources—UT Physicians (UTP) (1342 notes), Transcribed Medical Transcription Sample Reports and Examples (MTSamples) (146), Medical Information Mart for Intensive Care (MIMIC)-III (50), and Informatics for Integrating Biology and the Bedside (i2b2) (50), capturing 4 clinical entities (problems, tests, medications, other treatments) and 16 modifiers (eg, negation, certainty). Large Language Model Meta AI-2 and LLaMA-3 were instruction-tuned for clinical NER and RE, and their performance was benchmarked against BERT. Results Large Language Model Meta AI models consistently outperformed BERT across datasets. In data-rich settings (eg, UTP), LLaMA achieved marginal gains (approximately 1% improvement for NER and 1.5%-3.7% for RE). Under limited data conditions (eg, MTSamples, MIMIC-III) and on the unseen i2b2 dataset, LLaMA-3-70B improved F1 scores by over 7% for NER and 4% for RE. However, performance gains came with increased computational costs, with LLaMA models requiring more memory and Graphics Processing Unit (GPU) hours and running up to 28 times slower than BERT. Discussion While LLaMA models offer enhanced performance, their higher computational demands and slower throughput highlight the need to balance performance with practical resource constraints. Application-specific considerations are essential when choosing between LLMs and BERT for clinical IE. Conclusion Instruction-tuned LLaMA models show promise for clinical NER and RE tasks. However, the tradeoff between improved performance and increased computational cost must be carefully evaluated. We release our Kiwi package (https://kiwi.clinicalnlp.org/) to facilitate the application of both LLaMA and BERT models in clinical IE applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yan Hu
The University of Texas Health Science Center at Houston
Xu Zuo
The University of Texas Health Science Center at Houston
Yujia Zhou
Zhejiang University
Journal of the American Medical Informatics Association
Yale University
The University of Texas Health Science Center at Houston
Vanderbilt University Medical Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Hu et al. (Fri,) studied this question.
synapsesocial.com/papers/696b26b2d2a12237a934a06a — DOI: https://doi.org/10.1093/jamia/ocaf213