August 15, 2025

Are ChatGPT's Responses to Urologic Inquiries Readable and Supported by AUA Guidelines?

Key Points

ChatGPT's responses were supported by AUA guidelines 66% of the time across various urologic inquiries.
Quality assessment revealed that only 37% of responses were rated high quality, particularly in infertility topics.
The average readability score indicated responses were at a college graduate level, limiting patient accessibility.
Future iterations of ChatGPT could enhance its ability to provide clearer and more patient-friendly urologic health information.

Abstract

ABSTRACT Are ChatGPT 3.5's responses to patient inquiries about urologic health conditions (1) supported by the American Urological Association's guidelines and (2) readable and accessible to patients? Artificial intelligence technology continues to increase in popularity, but it still must be heavily vetted to ensure safety and accuracy prior to clinical implementation. ChatGPT has varying success when it comes to accurately answering medical questions. We wanted to see if the chatbot's responses to urologic inquiries were conveyed in a patient‐friendly manner and supported by the American Urological Association's guidelines. Our results were compared to those of prior studies looking at ChatGPT's performance on the United States Medical Licensing Examination and American Urological Association Self‐Assessment Study Programme. ChatGPT's responses to inquiries were compared to guideline statements set forth by the American Urological Association on its website. In this qualitative experiment, 30 prompts were written from a patient's perspective covering multiple urologic domains. The prompts were posed to ChatGPT 3.5 with responses recorded verbatim and graded with a Support Score and Quality Score by eight evaluators consisting of five board‐certified urologists and three current urology residents. Readability of the responses was assessed with Flesch–Kincaid Readability Grade Level scores and statistical analysis was performed with Stata version 15.1. 20/30 (66%) of ChatGPT's responses were supported by the American Urological Association's guidelines (median SS of 4, IQR 3–5), although responses to oncology questions were less supported (5/12 supported). 11/30 (37%) of responses were deemed high quality (median QS of 4, IQR 3–5) with responses related to infertility having the highest quality (3/4). The average Flesch–Kincaid Readability Grade Level score across all domains was 18, equivalent to a college graduate reading level. Most responses from ChatGPT 3.5 to urologic inquiries were supported by current American Urological Association guidelines, but the majority were of overall low quality. Responses were at a college graduate reading level, making them inaccessible to most patients. ChatGPT 3.5 has limitations in its ability to answer urologic health questions in a patient‐friendly manner, but future versions may improve its utility.

Mark Helpful

Bookmark

Relay