The exponential growth of heterogeneous digital information across structured and unstructured repositories presents a critical challenge for large language models (LLMs): the inability to access and reason over dynamically evolving knowledge without costly model retraining. This paper introduces a comprehensive Retrieval Augmented Generation (RAG) framework that integrates multimodal large language models (MLLMs) with real-time, knowledge-grounded question answering systems. The proposed architecture — MultiRAG — combines a dense bi-encoder retrieval backbone with a cross-modal fusion module capable of jointly indexing and retrieving text, images, tables, and structured data. Retrieved multimodal evidence is processed by a vision-language model (VLM) serving as the generative backbone, conditioned on retrieved context through a novel cross-attention grounding mechanism that attenuates hallucination by enforcing faithfulness constraints at the token level. Experiments conducted on four benchmark datasets — Natural Questions, WebQA, MultiModalQA, and a custom real-time knowledge update benchmark (RKUB-2024) — demonstrate that MultiRAG achieves 87.3% Exact Match on open-domain QA, 91.4% answer faithfulness score, and 6.7× reduction in hallucination rate compared to vanilla LLM baselines. Real-time knowledge ingestion pipeline latency averages 340 ms per document, supporting continuous knowledge grounding without model fine-tuning. The system reduces hallucination by 82% over standard LLM deployment and outperforms all retrieval-augmented baselines by 4.2–9.8 percentage points across evaluation metrics
Dr. K. Sujatha (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: