Abstract Background Medical education relies on experience to develop clinical reasoning skills, yet patient access is often limited. Large language models (LLMs) offer an accessible alternative for simulating patient encounters. We aimed to develop and evaluate an LLM-based clinical simulation system for gastroenterology that simulates entire clinical encounters in natural language, from history-taking through to treatment and follow-up planning. Methods The study was conducted in three phases. First, a system comprising an LLM-based clinical case generator, an LLM-based patient simulator, and a web chat interface was developed and iteratively refined. Second, two gastroenterology fellows performed 20 clinical case simulations each as initial testers, providing feedback that guided further system adjustments. Third, six independent gastroenterology fellows, blinded to the development process, evaluated the final system by completing the same 20 clinical simulations. All evaluators rated the system using a 5-point Likert scale across four domains: case presentation accuracy, adaptability to user interactions, realism, and educational value. Results A total of 160 evaluations were collected across 20 cases and 8 raters. At final evaluation, all domains scored highly: educational value (mean 4.57 ± 0.63), accuracy (4.42 ± 0.63), adaptability (4.33 ± 0.75), and realism (4.32 ± 0.81). Scores differed significantly across domains ( p < 0.01), with educational value rated highest. Evaluation panellists rated all domains significantly higher than initial testers (all p < 0.01) at an exploratory comparison. Conclusions The LLM-based clinical simulator demonstrated high accuracy, realism, adaptability, and educational value for gastroenterology case simulation in this pilot evaluation, and may represent an accessible and cost-effective tool for clinical reasoning training.
Maimaris et al. (Thu,) studied this question.