UNSTRUCTURED Background: Clinical note documentation is a vital yet time-intensive aspect of healthcare. While advancements in natural language processing (NLP) have transformed many domains, generating accurate summaries of doctor-patient conversations remains underexplored due to the limited availability of open-source datasets. Large Language Models (LLMs), with their training on vast datasets, present a promising solution to this challenge. Objective: Precision in clinical summarization is crucial as it directly impacts patient care and safety. This study evaluates the effectiveness of decoder-only LLMs compared to traditional encoder-decoder architectures in generating clinical notes from doctor-patient dialogues, focusing on maintaining medical accuracy and complying with healthcare privacy standards. Methods: We utilized the MTS-DIALOG dataset, containing 1,700 doctor-patient conversations paired with clinical notes. Our experiments involved fine-tuning several decoder-only LLMs, including Mistral, Meditron, and Llama, using a parameter-efficient fine-tuning approach. Results: Model performance was evaluated using ROUGE and BERT scores, demonstrating that Meditron-7B and Llama3-8B achieved state-of-the-art results, with Mistral-7B also performing competitively. The findings indicate that decoder-only LLMs, particularly Llama variants, outperform traditional models. Moreover, fine-tuning with higher quantization has the potential to further enhance performance. Conclusions: This study underscores the potential of decoder-only LLMs to transform clinical workflows by streamlining medical documentation, thereby enabling healthcare professionals to dedicate more time to patient care.
Ahmed et al. (Sun,) studied this question.