What question did this study set out to answer?

June 14, 2026Open Access

Development and Multireader Evaluation of Radiological RAG-System

Key Points

This study aims to develop and evaluate a retrieval-augmented generation (RAG) system for answering radiology-related queries.
Developed a knowledge base from 1049 documents through surveys and expert validation
Utilized structured document parsing and a two-level parent–child vector database
Evaluated performance through expert assessments and automated metrics
Achieved Contextual Precision of 0.735, Contextual Recall of 0.881, and Answer Relevancy of 0.890
Mean expert response accuracy of 4.53 out of 5
Substantial inter-expert agreement on evaluation criteria

Abstract

Large language models (LLMs) are increasingly being used in radiology-related workflows, but their application to reference, regulatory, and methodological queries remains limited by hallucinations and the static nature of model knowledge. This study aimed to develop and evaluate a retrieval-augmented generation (RAG) system for radiologists designed to provide grounded responses to such queries. A knowledge base was created through a survey of practicing radiologists and expert validation of sources, resulting in a corpus of 1049 documents. The system incorporated structured document parsing, a two-level parent–child vector database, hybrid dense–sparse retrieval, reranking, and a local large language model. Performance was assessed through functional testing, automated LLM-as-a-judge metrics, and multireader expert evaluation by 16 radiologists using 400 technical queries. No hallucinations were detected in the 77-query functional testing set during expert review. On the full technical dataset, automated Contextual Precision, Contextual Recall, and Answer Relevancy were 0.735, 0.881, and 0.890, respectively. Expert evaluation showed high response accuracy (mean, 4.53/5) and high expert-assessed Contextual Precision (0.886). Inter-expert agreement was substantial to excellent for most Likert-scale criteria. These findings suggest that a hierarchical RAG architecture can provide reliable access to radiology-specific reference information, although external validation and automated updating of the knowledge base remain necessary.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper