Medication errors significantly challenge healthcare, necessitating innovative analytical methods. This study explored generative pre-trained language models (LLMs) for Named Entity Recognition (NER) in Japanese medical incident reports. We assessed four LLMs-Llama-3-ELYZA, BioMistral-7B, GPT-4. 0 mini, and GPT-4. 0-using a national open-source dataset, comparing their NER performance with a published annotated version of the data and offering a prompt-based framework approach to address the clinical NER problem. Although GPT-4. 0 outperformed the others, it does not exceed the fine-tuned BERT model previously reported. Few-shot prompts achieved high accuracy for number-related entities (e. g. , 'Strengthᵣate', F1-score: 0. 951) matching the previous study, but clinically specific types underperformed due to language complexities. Despite these challenges, providing entity type definitions and a few examples improved GPT-4. 0's performance, highlighting LLMs' potential without extensive training and the necessity of considering the linguistic challenges in the clinical NER problem.
Ogi et al. (Thu,) studied this question.