Objective: To explore the performance of ChatGPT version 4.0 (GPT-4) and Gemini Advanced (Gemini) large language models (LLMs) in addressing common patient questions after gynecology surgery with regards to accuracy, relevance, helpfulness, and readability. Methods: In this cross-sectional study, two LLMs were prompted to generate answers to post-operative patient questions after gynecologic surgery. Post-operative patient questions were developed to simulate common patient questions after gynecologic surgery, based on expert opinion and compiled from anonymous posters on Reddit (r/endometriosis). Six topics were emphasized: endometriosis, vaginal bleeding, bowel/bladder function, incision care, resumption of activities, and sexual function. Questions were asked in a systematic submission process with the memory reset after each query. Responses were blinded and independently assessed for accuracy and relevance on a 5 Point Likert scale by four board-certified gynecologic surgeons with fellowship training in gynecologic surgery. Readability was calculated with the Flesch Kincaid grade level. Responses were also evaluated by three clinic nurses. Results: 41 questions were posed to GPT-4 and Gemini three times. These responses were independently evaluated by four surgeons and three nurses leading to a total of 1,968 evaluations for accuracy, relevance, helpfulness to the average patient, and readability. Surgeons and nurses graded Gemini responses as more accurate (4.23 vs 4.03, p=0.015) and helpful (4.37 vs 4.21, p=0.025) than GPT-4 responses. Responses from both models were similarly found to be relevant or very relevant (4.45 vs 4.36, p=0.2). Most responses by GPT-4 (85%) and Gemini (87%) were consistent across all questions. The average reading level for GPT-4 and Gemini responses were 11th and 10th grade, above the recommended 6th grade reading level for patient information. Conclusion: GPT-4 and Gemini provided overall accurate, relevant, and helpful responses to common post-operative patient questions for gynecologic surgery. Gemini outperformed GPT-4 in accuracy and helpfulness and had more readable responses.
Building similarity graph...
Analyzing shared references across papers
Loading...
Petra Voigt
Rhea Sharma
Angela Chaudhari
Applied Clinical Informatics
Northwestern University
Building similarity graph...
Analyzing shared references across papers
Loading...
Voigt et al. (Tue,) studied this question.
www.synapsesocial.com/papers/699fe36b95ddcd3a253e7390 — DOI: https://doi.org/10.1055/a-2818-1611