STUDY DESIGN: Cross-sectional study. OBJECTIVE: To evaluate whether the answers of different versions of ChatGPT to frequently asked questions about AIS compiled from patient education websites the American Academy of Orthopaedic Surgeons (AAOS) and the Scoliosis Research Society (SRS) provide appropriate and sufficient information to patients. SUMMARY OF BACKGROUND DATA: Artificial intelligence chatbots have gained popularity due to their ability to analyze substantial scientific data using machine learning techniques and generate human-like responses in medicine. These responses can guide patients and families who are seeking information online after a diagnosis of AIS. METHODS: Thirty frequently asked questions, selected by expert spine surgeons, were posed to 3 versions of ChatGPT using a new internet browser window for each question, and the responses were recorded. Three orthopedic spine surgeons graded the accuracy of the responses against 2 selected expert websites using a Likert scale. Finally, the response accuracy was evaluated for patient use. RESULTS: Median Likert scores for ChatGPT-3.5, ChatGPT-4, and ChatGPT-4o were 4 (1-5), 4 (2-5), and 4 (2-5), respectively. No significant differences were observed among versions within individual categories (all P>0.05). However, a significant difference was found in the overall response scores (P=0.004). Post hoc analysis revealed that ChatGPT-4o achieved significantly higher accuracy than ChatGPT-3.5 (P=0.005, Bonferroni-adjusted), whereas other pairwise comparisons were not significant. When the adequacy of the responses was evaluated, 26/30 (86%) of ChatGPT-3.5 responses were acceptable for patient use, whereas ChatGPT-4 and ChatGPT-4o provided appropriate responses in 29/30 (96%) of the questions. CONCLUSIONS: Successive ChatGPT versions demonstrated improved response reliability, with ChatGPT-4o showing a statistically significant advantage over ChatGPT-3.5. Given that ChatGPT-4 and ChatGPT-4o provided accurate and patient-appropriate answers in 96% of cases, these tools may assist in online patient education under clinician supervision. LEVEL OF EVIDENCE: Level III.
Özgür et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: