Key points are not available for this paper at this time.
Large Language Models (LLMs) often face challenges in generating accurate and reliable information, particularly in knowledge-intensive tasks. This limitation, referred to as hallucination, occurs when models produce content that is incorrect, irrelevant, or unsupported by evidence. Retrieval Augmented Generation (RAG) solutions provide a promising approach by integrating relevant external knowledge, enabling models to generate factually grounded responses. This study evaluates the performance of a base LLM model, a fine-tuned DistilBERT model, and two RAG architectures, Naïve RAG and Graph RAG, to study their impact on reducing hallucinations and enhancing contextual understanding. Using subsets of HaluEval, Squad-V2, and TriviaQA benchmark datasets, the base model achieved accuracies of 10.18%, 12.67%, and 5.46% respectively; Naive RAG resulted in 44.56%, 19.04%, and 35.32% accuracies; while the fine-tuned LLM model's accuracies were 72.5%, 72.31%, and 88.7% respectively. Graph RAG resulted in 8.85% and 15.12% accuracies using Squad-V2 and TriviaQA, respectively. Our findings show that while fine-tuned LLMs outperform baseline models, incorporating RAG solutions did not result in significant performance improvements, suggesting that the incorporation of external knowledge may not always align with the needs of the task. Experiments demonstrate that Graph RAG handles complex queries by leveraging relationships within structured knowledge graphs. Data organized as a knowledge graph may enable Graph RAG solutions reach their full potential by utilizing their capacity to efficiently retrieve contextually relevant information. Although computing complexity remains a restriction, this study shows that RAG topologies might not consistently enhance LLM reliability in practical situations.
AboulEla et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: