What question did this study set out to answer?

The aim is to automate the extraction of crucial infection indicators from home healthcare notes using instruction-tuned models.

April 15, 2026

Automating infection indicator extraction in home healthcare through instruction-tuned large language models

Key Points

The aim is to automate the extraction of crucial infection indicators from home healthcare notes using instruction-tuned models.
Developed a schema of 26 infection indicator categories.
Expanded training data through targeted annotation, context mutation, and synthetic generation.
Adapted moderate-sized models via Quantized Low-Rank Adaptation (QLoRA).
Compared the performance of different model sizes on a held-out test set.
Instruction-tuned models outperformed larger and smaller model baselines.
The best model, augmented Gemma-12B, achieved a partial micro-averaged F1 score of 0.879.
Data augmentation improved identification of rare indicators and interpretation of negations.
The top model consistently maintained a partial F1 score above 0.750 across all categories.

Abstract

Abstract Objective Home healthcare (HHC) clinical notes contain critical infection indicators that clinicians need in structured “indicator + context” pairs. Data sparsity and limited computing resources hinder automated extraction in decentralized HHC settings. This study developed and evaluated a resource-efficient pipeline using instruction-tuned, moderate-sized large language models (LLMs) to address these barriers. To address the data sparsity challenge, we also assessed the impact of a targeted LLM-based data augmentation strategy. Materials and Methods An expert-defined schema of 26 infection indicator categories was developed. We expanded the training set using a 3-stage workflow: targeted annotation, context mutation, and synthetic generation. We adapted 2 moderate-sized models (Gemma-12B and Qwen-14B) via Quantized Low-Rank Adaptation (QLoRA). We compared them to a larger-sized, prompted model and a smaller-sized, fully fine-tuned LLM. We evaluated all models on a held-out test set using partial micro-averaged F1 score, output reliability metrics, and qualitative error analysis. Results Instruction-tuned moderate-sized LLMs outperformed both baselines. The top-performing model, augmented Gemma-12B, achieved a partial micro-averaged F1 score of 0.879. LLM-based data augmentation enhanced overall performance, improving the identification of rare indicators and the interpretation of negations. The best model maintained a partial F1 score above 0.750 across all indicator categories. It also showed high format adherence, confirming its ability to generate reliable structured outputs. Discussion Instruction-tuning moderate-sized LLMs with QLoRA and targeted data augmentation enables high-accuracy extraction of infection indicators from HHC notes. Conclusion This resource-efficient pipeline provides a scalable foundation for automated infection surveillance in healthcare settings with limited resources.

Bookmark

Cite This Study

Xu et al. (Sat,) studied this question.

synapsesocial.com/papers/69df2a99e4eeef8a2a6afad6 https://doi.org/https://doi.org/10.1093/jamia/ocag040

Bookmark