Retrieval-Augmented Generation (RAG) systems integrate large language models with information retrieval to ground responses in factual data. This study systematically evaluates the contribution of each RAG component in a medical question answering system through comprehensive ablation analysis. We designed a hierarchical RAG architecture with six key components: hierarchical intent classification, query rewriting, two-stage retrieval (dense retrieval with FAISS + cross-encoder reranking using Clinical-Longformer), and specialist routing. We conducted systematic ablation studies across seven configurations on 476 medical questions from MedQA benchmarks. Each configuration was evaluated independently using GPT-4o mini as an LLM judge across four metrics: context relevance, completeness, faithfulness, and correctness (1-5 Likert scale), with each metric assessed through separate evaluation calls to minimize inter-metric bias. Statistical significance was validated through paired t-tests with effect size calculations (Cohen’s d). The full system achieved an overall score of 3.64/5.0. Systematic ablation revealed two critical components: reranking (removal: -0.24 overall, P
Building similarity graph...
Analyzing shared references across papers
Loading...
Hakan Emekci
Daniel Quillan Roxas
Black Sea Journal of Engineering and Science
TED University
Building similarity graph...
Analyzing shared references across papers
Loading...
Emekci et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69b8f11edeb47d591b8c5ff9 — DOI: https://doi.org/10.34248/bsengineering.1849342