What question did this study set out to answer?

This research aims to enhance question answering performance in real-time using a multimodal approach.

May 9, 2026Open Access

Retrieval Augmented Generation Using Multimodal Large Language Models for Real-Time Knowledge-Grounded Question Answering

Key Points

This research aims to enhance question answering performance in real-time using a multimodal approach.
Introduced MultiRAG framework integrating multimodal large language models with a real-time QA system.
Used a dense bi-encoder retrieval backbone and a vision-language model for processing and generation.
Conducted experiments on four benchmark datasets including Natural Questions and RKUB-2024.
Achieved 87.3% Exact Match and 91.4% answer faithfulness score on open-domain QA.
Demonstrated a 6.7× reduction in hallucination rate compared to standard LLM baselines.
Reduced hallucination by 82% over standard LLM deployment, outperforming retrieval-augmented models by 4.2–9.8 percentage points.

Abstract

The exponential growth of heterogeneous digital information across structured and unstructured repositories presents a critical challenge for large language models (LLMs): the inability to access and reason over dynamically evolving knowledge without costly model retraining. This paper introduces a comprehensive Retrieval Augmented Generation (RAG) framework that integrates multimodal large language models (MLLMs) with real-time, knowledge-grounded question answering systems. The proposed architecture — MultiRAG — combines a dense bi-encoder retrieval backbone with a cross-modal fusion module capable of jointly indexing and retrieving text, images, tables, and structured data. Retrieved multimodal evidence is processed by a vision-language model (VLM) serving as the generative backbone, conditioned on retrieved context through a novel cross-attention grounding mechanism that attenuates hallucination by enforcing faithfulness constraints at the token level. Experiments conducted on four benchmark datasets — Natural Questions, WebQA, MultiModalQA, and a custom real-time knowledge update benchmark (RKUB-2024) — demonstrate that MultiRAG achieves 87.3% Exact Match on open-domain QA, 91.4% answer faithfulness score, and 6.7× reduction in hallucination rate compared to vanilla LLM baselines. Real-time knowledge ingestion pipeline latency averages 340 ms per document, supporting continuous knowledge grounding without model fine-tuning. The system reduces hallucination by 82% over standard LLM deployment and outperforms all retrieval-augmented baselines by 4.2–9.8 percentage points across evaluation metrics

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper