Large Language Models (LLMs) are widely used and accessible. We investigate whether publicly available LLMs provide useful, safe, helpful and accurate information to the non-expert general community seeking answers about moyamoya. ChatGPT-4o and Gemini 1.5 Flash were directly single-shot prompted with ten frequently asked questions about moyamoya. A survey of these responses was posted on the Moyamoya Foundation website for ten weeks. Respondents were randomly assigned to read either ChatGPT or Gemini generated responses. Clinicians treating cerebrovascular disease evaluated the safety and accuracy of all responses. Community respondents evaluated 27 sets of ChatGPT output and 20 sets of Gemini output. Output length was significantly different (p < 0.001). 1.2% and 20.8% of ChatGPT and Gemini answers were reported as “short,” respectively. The LLMs failed to address potential risks for procedures and medications it mentioned (ChatGPT 38%, Gemini 28.6%). Responses omitted when these self-care strategies become insufficient and a medical professional should be consulted (ChatGPT 27.2%, Gemini 40.8%). However, community respondents felt LLM answers were of similar quality (ChatGPT 47.8%, Gemini 49%) or somewhat better (ChatGPT 24.4%, Gemini 22.4%) than one received from their physicians. Physicians evaluating the same LLM outputs reported the answers failed to address recent advances and research within the field (ChatGPT 57.5%, Gemini 62.5%) and failed to address urgent symptoms warranting referral to higher levels of care (ChatGPT 70.0%, Gemini 70.0%). LLM responses are perceived as being of similar quality to a physician, but limitations remain for safety, omission of data and their impact on patient-physician relationships.
Ruppert-Gomez et al. (Thu,) studied this question.