Retrieval-Augmented Generation (RAG) has become the dominant paradigm for grounding large languagemodels (LLMs) in external knowledge. However, current RAG systems suffer from critical limitations:static retrieval regardless of query complexity, poor handling of multi-hop reasoning, context windowpollution from irrelevant passages, and inability to self-correct retrieval failures. We propose ACRA(Adaptive Contextual Retrieval Architecture), a novel multi-phase RAG framework that dynamically adjusts its retrieval depth, employs self-reflective verification loops, and incorporates speculative retrievalplanning. ACRA integrates insights from reasoning-effort scaling, chain-of-thought decomposition, andself-supervised representation learning (drawing from JEPA-family architectures) to create a system thatreasons about what to retrieve before retrieving it. This paper provides: (1) a comprehensive survey ofmodel evaluation benchmarks critical for selecting generator models, (2) a taxonomy of existing RAGapproaches and their failure modes, (3) the complete ACRA architecture specification, and (4) a phasedimplementation roadmap. Our theoretical analysis suggests ACRA can reduce retrieval calls by 40–60%on simple queries while improving answer quality on complex multi-hop questions by 25–35% comparedto standard RAG pipelines. Empirical evaluation on a 50-document, 5-domain knowledge base with 30annotated queries confirms these predictions: ACRA achieves 75.7% fact coverage on complex (L3) queries,a +5.0% improvement over Naive RAG and +21.8% over Advanced RAG, while matching perfect scoreson simple queries with adaptive retrieval depth.
Building similarity graph...
Analyzing shared references across papers
Loading...
Krushna Dere
Building similarity graph...
Analyzing shared references across papers
Loading...
Krushna Dere (Thu,) studied this question.
synapsesocial.com/papers/69e320e740886becb6540153 — DOI: https://doi.org/10.5281/zenodo.19615602