What question did this study set out to answer?

This study examines how well large language models provide accurate information about moyamoya to non-experts.

February 28, 2026Open Access

Patient perspective on large-language model responses to questions about Moyamoya

Key Points

This study examines how well large language models provide accurate information about moyamoya to non-experts.
LLMs were prompted with ten common questions about moyamoya.
Responses were surveyed on the Moyamoya Foundation website for ten weeks.
Community respondents rated outputs from ChatGPT and Gemini, while clinicians assessed safety and accuracy.
Statistical comparisons were made regarding output length and response sufficiency.
Output length varied significantly between the models (p < 0.001).
A low percentage of responses addressed potential risks (ChatGPT 38%, Gemini 28.6%).
Omissions regarding when to seek medical consultation were noted (ChatGPT 27.2%, Gemini 40.8%).
Community respondents rated LLM answers comparably to physician responses.
Physicians found significant omissions in urgency and recent advances in treatment.

Abstract

Large Language Models (LLMs) are widely used and accessible. We investigate whether publicly available LLMs provide useful, safe, helpful and accurate information to the non-expert general community seeking answers about moyamoya. ChatGPT-4o and Gemini 1.5 Flash were directly single-shot prompted with ten frequently asked questions about moyamoya. A survey of these responses was posted on the Moyamoya Foundation website for ten weeks. Respondents were randomly assigned to read either ChatGPT or Gemini generated responses. Clinicians treating cerebrovascular disease evaluated the safety and accuracy of all responses. Community respondents evaluated 27 sets of ChatGPT output and 20 sets of Gemini output. Output length was significantly different (p < 0.001). 1.2% and 20.8% of ChatGPT and Gemini answers were reported as “short,” respectively. The LLMs failed to address potential risks for procedures and medications it mentioned (ChatGPT 38%, Gemini 28.6%). Responses omitted when these self-care strategies become insufficient and a medical professional should be consulted (ChatGPT 27.2%, Gemini 40.8%). However, community respondents felt LLM answers were of similar quality (ChatGPT 47.8%, Gemini 49%) or somewhat better (ChatGPT 24.4%, Gemini 22.4%) than one received from their physicians. Physicians evaluating the same LLM outputs reported the answers failed to address recent advances and research within the field (ChatGPT 57.5%, Gemini 62.5%) and failed to address urgent symptoms warranting referral to higher levels of care (ChatGPT 70.0%, Gemini 70.0%). LLM responses are perceived as being of similar quality to a physician, but limitations remain for safety, omission of data and their impact on patient-physician relationships.

Bookmark

View Full Paper

Cite This Study

Ruppert-Gomez et al. (Thu,) studied this question.

synapsesocial.com/papers/69a286490a974eb0d3c0122b https://doi.org/https://doi.org/10.1007/s00701-025-06743-w

Bookmark

View Full Paper