April 15, 2024

Mp23-02 Evaluating Accuracy and Readability Characteristics of Chatgpt Responses to Common Questions About Pelvic Support Problems, Incontinence, and Urinary Tract Infections

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

You have accessJournal of UrologyUrodynamics/Lower Urinary Tract Dysfunction/Female Pelvic Medicine: Female Incontinence (MP23)1 May 2024MP23-02 EVALUATING ACCURACY AND READABILITY CHARACTERISTICS OF CHATGPT RESPONSES TO COMMON QUESTIONS ABOUT PELVIC SUPPORT PROBLEMS, INCONTINENCE, AND URINARY TRACT INFECTIONS Alice H. Linder, Irene Su, Doreen E. Chung, and Gina M. Badalato Alice H. LinderAlice H. Linder , Irene SuIrene Su , Doreen E. ChungDoreen E. Chung , and Gina M. BadalatoGina M. Badalato View All Author Informationhttps://doi.org/10.1097/01.JU.0001008776.99097.8a.02AboutPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookLinked InTwitterEmail Abstract INTRODUCTION AND OBJECTIVE: Generative artificial intelligence and chatbots such as ChatGPT are emerging as valuable tools to provide comprehensive medical information to the public, but much remains unknown about the quality of generated content. This study aims to determine the accuracy and readability of ChatGPT responses to questions about urogynecologic issues. METHODS: 49 frequently asked questions (FAQs) from 3 categories (pelvic support problems PSP, urinary incontinence UI, and urinary tract infections UTI) were compiled from the American College of Obstetricians and Gynecologists website. Each FAQ was posed to ChatGPT in two separate sessions, generating a total of 98 responses. Questions were categorized as information, diagnosis, or treatment and assessed by two independent reviewers for accuracy using precision (TP/TP+FP; a measure of the spread of information) and recall scores (TP/TP+FN; a measure of the comprehensiveness of the answer). Cosine similarity was calculated in Python as a proxy for reproducibility. The Flesch-Kincaid calculator was used to assess readability. RESULTS: Precision was low across all three question types and categories, with an average of 0.31 (SD 0.13) for PSP, 0.31 (SD 0.12) for UI, and 0.23 (0.13) for UTI. Recall was high across all question categories with an average of 0.73 (SD 0.17) for PSP, 0.73 (SD 0.14) for UI, and 0.70 (SD 0.26) for UTI (Table 1). Incorrect information was present in 0.0%, 23.1%, and 4.2% of responses in the three categories, respectively. The ChatGPT responses were on average longer compared to the reference material (413 words vs. 77 words). Reading level was determined to be at the college level or higher for 100% of the ChatGPT responses compared to 50%, 53.8%, and 16.7% of the reference answers. Cosine similarity was 0.90 for PSP, 0.89 for UI, and 0.89 for UTI. CONCLUSIONS: ChatGPT provides comprehensive answers to questions related to common urogynecologic concerns but with low precision and some incorrect information. Furthermore, ChatGPT generated answers were of low readability, requiring college level or higher comprehension. Patients should be cautioned that ChatGPT-generated urogynecologic information is not yet credible enough for use. Furthermore, to be clinically useful to patients, ChatGPT inputs must be optimized to generate concise, comprehensible information. Source of Funding: None © 2024 by American Urological Association Education and Research, Inc.FiguresReferencesRelatedDetails Volume 211Issue 5SMay 2024Page: e381 Advertisement Copyright & Permissions© 2024 by American Urological Association Education and Research, Inc.Metrics Author Information Alice H. Linder More articles by this author Irene Su More articles by this author Doreen E. Chung More articles by this author Gina M. Badalato More articles by this author Expand All Advertisement PDF downloadLoading ...

Me gusta

Guardar