May 9, 2026Open Access

Optimising clinical information extraction: a comparative study of retrieval-augmented generation techniques in clinical notes

Key Points

Key points are not available for this paper at this time.

Abstract

Extracting clinically meaningful information from free-text notes in specific clinical settings, such as Australian aged care facilities, remains challenging due to the heterogeneity of text documents, which lack standardised structure and terminology. Retrieval-augmented generation (RAG) can improve the precision and grounding of large language model (LLM) outputs; however, the choice of retrieval strategies remains understudied despite its critical importance for clinical information extraction (IE). Using real-world clinical notes from Australian aged care facilities, we systematically compare six retrieval methods within a unified RAG pipeline: sparse retrieval (BM25), dense retrieval (bi-encoder embeddings), dense retrieval with cross-encoder rerank (abbreviated as dense reranking), dynamic linear fusion of sparse and dense scores, reciprocal rank fusion (RRF), and hybrid coarse-to-fine reranking. We evaluate these strategies on two clinical named entity recognition tasks, extracting agitation symptoms in dementia (n = 208) and identifying malnutrition risk factors (n = 208) across five dimensions: context relevance, answer quality, source faithfulness, contextual diversity, and item-level accuracy. A repeated-measures ANOVA reveals that reranking and hybrid ensemble strategies significantly outperform both standalone sparse (BM25) and dense retrieval across both tasks. For agitation extraction, Dense reranking achieves the highest Answer F1 (0.946), Context Diversity (0.895) and Item-level Accuracy (0.963). For malnutrition, ensemble methods yield the best Item-level Accuracy (0.944), followed closely by dense reranking. Through error analysis, we identify three LLM error types in clinical named entity recognition: intrinsic hallucination, extrinsic hallucination, and false negatives, and elucidate how RAG mitigates each. These findings demonstrate that reranking-based retrieval substantially enhances the performance of RAG pipelines for clinical information extraction. It offers a practical approach for improving automated analysis of unstructured clinical text. Our four-stage experimental workflow - document indexing, context retrieval, LLM generation, and structured output formatting - provides a replicable framework for future clinical information extraction research and downstream predictive modelling.

Optimising clinical information extraction: a comparative study of retrieval-augmented generation techniques in clinical notes

Key Points

Abstract

Cite This Study