What question did this study set out to answer?

This research aims to develop and evaluate a clinical simulation system using large language models for improving medical education in gastroenterology.

June 26, 2026Open Access

Development and pilot evaluation of a large language model-based clinical simulation system for medical education in gastroenterology

Key Points

This research aims to develop and evaluate a clinical simulation system using large language models for improving medical education in gastroenterology.
Developed a system with an LLM-based clinical case generator and patient simulator.
Twenty clinical case simulations were conducted by two initial testers for refining the system.
Six independent gastroenterology fellows evaluated the final system across four domains using a 5-point Likert scale.
Educational value scored a mean of 4.57 ± 0.63, the highest among evaluated domains.
Significant differences noted across domains, with p < 0.01.
Final evaluations were significantly higher than initial tester ratings in all domains (all p < 0.01).

Abstract

Abstract Background Medical education relies on experience to develop clinical reasoning skills, yet patient access is often limited. Large language models (LLMs) offer an accessible alternative for simulating patient encounters. We aimed to develop and evaluate an LLM-based clinical simulation system for gastroenterology that simulates entire clinical encounters in natural language, from history-taking through to treatment and follow-up planning. Methods The study was conducted in three phases. First, a system comprising an LLM-based clinical case generator, an LLM-based patient simulator, and a web chat interface was developed and iteratively refined. Second, two gastroenterology fellows performed 20 clinical case simulations each as initial testers, providing feedback that guided further system adjustments. Third, six independent gastroenterology fellows, blinded to the development process, evaluated the final system by completing the same 20 clinical simulations. All evaluators rated the system using a 5-point Likert scale across four domains: case presentation accuracy, adaptability to user interactions, realism, and educational value. Results A total of 160 evaluations were collected across 20 cases and 8 raters. At final evaluation, all domains scored highly: educational value (mean 4.57 ± 0.63), accuracy (4.42 ± 0.63), adaptability (4.33 ± 0.75), and realism (4.32 ± 0.81). Scores differed significantly across domains ( p < 0.01), with educational value rated highest. Evaluation panellists rated all domains significantly higher than initial testers (all p < 0.01) at an exploratory comparison. Conclusions The LLM-based clinical simulator demonstrated high accuracy, realism, adaptability, and educational value for gastroenterology case simulation in this pilot evaluation, and may represent an accessible and cost-effective tool for clinical reasoning training.

Bookmark

View Full Paper

Cite This Study

Maimaris et al. (Thu,) studied this question.

synapsesocial.com/papers/6a3e1a93030ad1a9b309301a https://doi.org/https://doi.org/10.1186/s12909-026-09709-3

Bookmark

View Full Paper