What question did this study set out to answer?

This research aims to design and evaluate a multi-agent AI virtual assistant for clinical environments.

January 14, 2026Open Access

Designing an Architecture of a Multi-Agentic AI-Powered Virtual Assistant Using LLMs and RAG for a Medical Clinic

Key Points

This research aims to design and evaluate a multi-agent AI virtual assistant for clinical environments.
Designed an agentic virtual assistant using LLMs and RAG technologies.
Implemented an orchestrator architecture for user query routing to specialized tools.
Combined LangChain and LangGraph with visualization tools for interactive analysis.
Evaluated performance using qualitative and quantitative metrics like BLEU and sentiment analysis.
The multi-agent architecture improved reliability and interpretability for clinical tasks.
Evaluation metrics confirmed enhanced response quality and grounding compared to previous models.
User interactions showed high relevance and completeness with the new system.

Abstract

This paper presents the design, implementation and evaluation of an agentic virtual assistant (VA) for a medical clinic, combining large language models (LLMs) with retrieval-augmented generation (RAG) technology and multi-agent artificial intelligence (AI) frameworks to enhance reliability, clinical accuracy and explainability. The assistant has multiple functionalities and is built around an orchestrator architecture in which a central agent dynamically routes user queries to specialized tools for retrieval-augmented question answering (Q&A), document interpretation and appointment scheduling. The implementation combines LangChain and LangGraph with interactive visualizations to track reasoning steps, prompts using Gemini 2.5 Flash defines tool usage and strict formatting rules, maintaining reliability and mitigating hallucinations. Prompt engineering has an important role in the implementation and thus, it is designed to assist the patient in the human–computer interaction. Evaluation through qualitative and quantitative metrics, including ROUGE, BLEU, LLM-as-a-judge and sentiment analysis, confirmed that the multi-agent architecture enhances interpretability, accuracy and context-aware performance. Evaluation shows that the multi-agent architecture improves reliability, interpretability and alignment with medical requirements, supporting diverse clinical tasks. Furthermore, the evaluation shows that Gemini 2.5 Flash combined with clinic-specific RAG significantly improves response quality, grounding and coherence compared with earlier models. SBERT analyses confirm strong semantic alignment across configurations, while LLM-as-a-judge scores highlight the superior relevance and completeness of the 2.5 RAG setup. Although some limitations remain, the updated system provides a more reliable and context-aware solution for clinical question answering.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper