What question did this study set out to answer?

The aim is to enhance the diagnostic capabilities of large language models in clinical scenarios.

March 28, 2026Open Access

Grounding large language models in clinical diagnostics

Key Points

The aim is to enhance the diagnostic capabilities of large language models in clinical scenarios.
Development of ClinDiag-GPT, a specialized LLM tuned for diagnostic tasks
Utilization of ClinDiag-Framework and ClinDiag-Benchmark for evaluation
Comparative analysis of ClinDiag-GPT with existing LLMs on real-world clinical cases
ClinDiag-GPT outperformed baseline models in diagnostic accuracy
Found that collaboration with ClinDiag-GPT increased diagnostic efficiency
Existing LLMs showed limitations in dynamic diagnostic workflows

Abstract

Although Large Language Models (LLMs) possess extensive medical knowledge, they often struggle to emulate the complex, iterative process of real-world clinical diagnosis. To address this limitation, we present ClinDiag-GPT, a specialized LLM fine-tuned to execute full diagnostic procedures, supported by the ClinDiag-Framework evaluation system and ClinDiag-Benchmark, a dataset comprising 4,421 real-world cases. Our evaluation shows that existing LLMs, including GPT-4o-mini, GPT-4o, Claude-3-Haiku, Qwen2.5-72b, Qwen2.5-32b, and Qwen2.5-14b, while proficient in static tasks, fall short in dynamic diagnostic workflows and frequently commit clinical errors. In contrast, ClinDiag-GPT, trained on clinical cases, outperforms all baseline models in both diagnostic accuracy and procedural performance. Furthermore, a comparative analysis reveals that collaboration between physicians and ClinDiag-GPT yields higher diagnostic accuracy and efficiency compared to either working alone, demonstrating the utility of ClinDiag-GPT as a clinical assistant.

KI fragen

Bookmark

View Full Paper

Cite This Study

Chen et al. (Wed,) studied this question.

synapsesocial.com/papers/69c770888bbfbc51511e0a59 https://doi.org/https://doi.org/10.1038/s41467-026-70274-w

KI fragen

Bookmark

View Full Paper