Key points are not available for this paper at this time.
PURPOSE: Electronic messaging has been associated with increased physician burnout, and these messages have only increased in quantity since the COVID-19 pandemic. We aim to assess the efficacy of ChatGPT, an artificial intelligence (AI) chatbot, compared to healthcare providers in answering patient-directed messages related to breast reconstruction. METHODS: Ten de-identified questions pertaining to breast reconstruction were extracted from electronic messages. They were presented to ChatGPT3, ChatGPT4, physicians, and advanced practice providers for responses. ChatGPT3 and ChatGPT4 were also prompted to give brief responses. Using 1-5 Likert scoring, accuracy and empathy were graded by an expert and two medical students, respectively. Readability was measured using Flesch Reading Ease (FRE). Grades were compared using 2-tailed t-tests. RESULTS: 80 responses were analyzed and recorded. FRE for combined provider responses was better than combined AI responses (53.33±13.27 vs. 35.97±11.62, p<.001) and brief-AI responses (53.33±13.27 vs. 34.68±12.81, p<.001). However, all providers were given lower empathy scores than all AI responses (2.013±0.74 vs. 2.875±0.75, p<.001) and brief-AI responses (2.013±0.74 vs. 2.875±0.75, p<.001). Furthermore, while advanced practice providers had similar accuracy scores to brief-AI responses (4.50±0.99 vs. 4.75±0.44, p=.333), physicians had lower accuracy (4.16±1.07 vs. 4.75±0.44, p=.035). CONCLUSION: We provide insights into AI chatbot efficacy, suggesting a complementary role for AI in healthcare, particularly in delivering empathetic and accurate responses. Further refinement in the readability of AI-generated information is needed.
Soroudi et al. (Wed,) studied this question.