August 16, 2025

Prompt design and comparing large language models for healthcare simulation case scenarios

Key Points

ChatGPT-4 generated higher quality scenarios than Bing and Bard, achieving mean scores of 71.25 and 85.09 for different prompting methods.
Scenarios were evaluated using the Simulation Scenario Evaluation Tool, revealing significant quality differences across models.
The mixed methods exploratory design included both quantitative scores and qualitative content analysis of the generated scenarios.
Findings indicate that careful prompt design can enhance scenario quality in healthcare simulations.

Abstract

Background Large language models (LLMs), such as ChatGPT, Bing and Bard, have shown promise in various applications. Their potential in healthcare simulation scenario design remains minimally explored. With the wide adoption of simulation-based education (SBE), there is an opportunity to leverage these LLMs to streamline simulation scenario creation. This study aims to compare the quality of scenarios generated by LLMs and explore their responses based on different prompting techniques. Methods Utilizing a mixed methods exploratory sequential comparative design, we conducted a comparative analysis quantitatively and qualitatively of 90 simulation case scenarios generated among ChatGPT-4, Bing Precise and Bard. Scenarios were generated using two prompting techniques: zero-shot prompting and prompt chaining. The quality of all scenarios was rated using the Simulation Scenario Evaluation Tool. Results ChatGPT-4 scored best in both zero-shot and prompt chaining case scenarios, with a mean score of 71.25 and 85.09, respectively, compared to Bard (58.40 and 44.27) and Bing Precise (48.67 and 39.65). Qualitative content analyses were additionally conducted to provide additional insights into the quality of the scenarios. Conclusions The findings show marked differences in scenario quality across and between models, underscoring the need for targeted prompt design. This study demonstrates the limitations and potential of LLMs in generating healthcare simulation case scenarios.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sara Maaz

Sadek Obeidat

Cynthia J. Mosher

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Prompt design and comparing large language models for healthcare simulation case scenarios

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider