What type of study is this?

September 10, 2025Open Access

Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks

Puntos clave

The system achieved 82.98% and 86.24% accuracy on USMLE Step 1 and Step 2, respectively.
Comparative evaluations showed the agentic system surpassing or closely matching state-of-the-art medical LLMs.
It employs a memory bank for efficient long-context inference beyond standard LLM capabilities.
The results indicate a promising approach for reliable medical AI systems through dynamic reasoning.

Resumen

Objective: To evaluate if a tool-using agent-based system utilizing large language models (LLMs) for medical question-answering (QA) tasks outperforms standalone LLMs. Methods: We developed a unified, open-source LLM-based agentic system that integrates document retrieval, re-ranking, evidence grounding, and diagnosis generation to support dynamic, multi-step medical reasoning. Our system features a lightweight retrieval-augmented generation pipeline coupled with a cache-and-prune memory bank, enabling efficient long-context inference beyond standard LLM limits. The system autonomously invokes specialized tools, eliminating the need for manual prompt engineering or brittle multi-stage templates. We compared the agentic system against standalone LLMs on various medical QA benchmarks. Results: Evaluated on five well-known medical QA benchmarks, our system outperforms or closely matches state-of-the-art proprietary and open-source medical LLMs in multiple-choice and open-ended formats. Specifically, our system achieved accuracies of 82.98% on USMLE Step 1 and 86.24% on USMLE Step 2, surpassing GPT-4's 80.67% and 81.67%, respectively, while closely matching on USMLE Step 3 (88.52% vs. 89.78%). Conclusion: Our findings highlight the value of combining tool-augmented and evidence-grounded reasoning strategies to build reliable and scalable medical AI systems.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo