What question did this study set out to answer?

To improve clinical reasoning in retrieval-augmented generation systems using a multi-agent framework.

May 18, 2026Open Access

ER-MedRAG: A multi-agent reinforcement learning framework for reliable clinical retrieval-augmented reasoning

Key Points

To improve clinical reasoning in retrieval-augmented generation systems using a multi-agent framework.
Introduced ER-MedRAG framework with Extractor-Respondent architecture for evidence retrieval.
Implemented Direct Preference Optimization and Group Relative Policy Optimization for agent training.
Evaluated on six medical question answering benchmarks with various open-source base models.
ER-MedRAG outperformed RAG and reinforcement-learning baselines with 3% to 6% accuracy gains.
Achieved notable improvements on reasoning-intensive benchmarks such as MMLU-ProM and GPQA-M.
Reduced output entropy, indicating more stable and reliable clinical reasoning.

Abstract

Large language models (LLMs) have recently demonstrated impressive advances in complex reasoning, yet their performance in clinical natural language processing (NLP) remains limited. Clinical tasks require grounding in extensive domain-specific knowledge, precise evidence integration, and reliable multi-step reasoning–capabilities that current LLMs struggle to achieve. Retrieval-Augmented Generation (RAG) offers a promising solution by incorporating external medical knowledge without additional model training. However, existing clinical RAG systems face three major challenges: imprecise retrieval from long and complex medical documents, difficulty transforming retrieved evidence into coherent reasoning processes, and high sensitivity to retrieval noise. To address these limitations, we introduce ER-MedRAG (Extractor-Respondent Medical Retrieval-Augmented Generation), a multi-agent reinforcement learning framework designed to enhance clinical reasoning in RAG systems. ER-MedRAG employs an Extractor–Respondent architecture that first performs a coarse-to-fine hybrid retrieval process to identify highly relevant evidence snippets. The extractor agent then converts each snippet into a structured condition–relation–conclusion reasoning triplet. These triplets are subsequently concatenated into a unified representation and passed to the respondent agent to guide clinical decision-making. To strengthen each agent’s specialized capabilities, we develop a two-stage reinforcement learning paradigm: the extractor is optimized using Direct Preference Optimization (DPO) to generate concise and informative reasoning triplets, while the respondent is trained with Group Relative Policy Optimization (GRPO) to effectively leverage structured evidence and remain robust to retrieval noise. We evaluate ER-MedRAG on six medical question answering benchmarks spanning multiple difficulty levels, including MedQA, MedMCQA, PubMedQA, MMLU-ProM, GPQA-M, and MedXpertQA, using both 7B/8B and 70B open-source base models. Experimental results demonstrate that ER-MedRAG consistently outperforms strong RAG and reinforcement-learning-based baselines, achieving accuracy gains ranging from 3% to 6% across six medical question answering datasets, with especially pronounced improvements on reasoning-intensive benchmarks such as MMLU-ProM, GPQA-M, and MedXpertQA. Moreover, ER-MedRAG reduces output entropy, indicating more stable and reliable clinical reasoning.

KI fragen

Bookmark

View Full Paper

Cite This Study

Shi et al. (Sat,) studied this question.

synapsesocial.com/papers/6a0aaccf5ba8ef6d83b70252 https://doi.org/https://doi.org/10.1007/s44443-026-00825-0

KI fragen

Bookmark

View Full Paper