Abstract Large language models (LLMs) are the core of many Artificial Intelligence (AI) systems. One of the key problems with these systems is hallucination (i.e., making up facts). Retrieval-Augmented Generation (RAG) solves this problem by grounding responses in external knowledge sources, thereby improving the factual accuracy of the response. The RAG system consists of two core components: the information retrieval component (retriever and rerankers) and the text generation component (LLM). So the efficacy of a RAG system depends on the retrieval strategies, reranking mechanisms, and generation models. In this study, we conduct a systematic evaluation of 9 retriever–reranker configurations (3 retrievers (Fusion, HyDE, and HyPE), 3 rerankers (BGE, MiniLM, and GPT-4o-mini)) within a controlled RAG framework. Our analysis extends beyond traditional retrieval metrics by evaluating Mean Reciprocal Rank (MRR), generation correctness, faithfulness, relevance, cost, and latency. Results show that LLM-based reranking consistently improves downstream generation quality, with the HyPE + GPT-4o-mini configuration achieving the highest overall performance with correctness and relevance scores of 0.8012 and 0.9267, respectively, and the only positive MRR gain. While cross-encoder rerankers offer lower latency and cost, they exhibit a measurable decline in answer quality.
Building similarity graph...
Analyzing shared references across papers
Loading...
Harun Elkiran
İstanbul Sabahattin Zaim Üniversitesi
Jawad Rasheed
İstanbul Sabahattin Zaim Üniversitesi
Istanbul Medipol University
İstanbul Nişantaşı Üniversitesi
İstanbul Sabahattin Zaim Üniversitesi
Building similarity graph...
Analyzing shared references across papers
Loading...
Elkiran et al. (Sat,) studied this question.
synapsesocial.com/papers/6a0172233a9f334c282723d0 — DOI: https://doi.org/10.1007/s10791-026-10156-3