Large language models (LLMs) are increasingly being used in radiology-related workflows, but their application to reference, regulatory, and methodological queries remains limited by hallucinations and the static nature of model knowledge. This study aimed to develop and evaluate a retrieval-augmented generation (RAG) system for radiologists designed to provide grounded responses to such queries. A knowledge base was created through a survey of practicing radiologists and expert validation of sources, resulting in a corpus of 1049 documents. The system incorporated structured document parsing, a two-level parent–child vector database, hybrid dense–sparse retrieval, reranking, and a local large language model. Performance was assessed through functional testing, automated LLM-as-a-judge metrics, and multireader expert evaluation by 16 radiologists using 400 technical queries. No hallucinations were detected in the 77-query functional testing set during expert review. On the full technical dataset, automated Contextual Precision, Contextual Recall, and Answer Relevancy were 0.735, 0.881, and 0.890, respectively. Expert evaluation showed high response accuracy (mean, 4.53/5) and high expert-assessed Contextual Precision (0.886). Inter-expert agreement was substantial to excellent for most Likert-scale criteria. These findings suggest that a hierarchical RAG architecture can provide reliable access to radiology-specific reference information, although external validation and automated updating of the knowledge base remain necessary.
Erizhokov et al. (Fri,) studied this question.