Aims: This study aimed to evaluate and compare the quality, accuracy, and readability of orthodontic retention-related information provided by five AI-based chatbot platforms: ChatGPT-4, Copilot, Gemini, MediSearch, and OpenEvidence. Methods: A set of 25 commonly asked patient-oriented questions on orthodontic retention was submitted to each chatbot. Responses were evaluated using five key metrics: EQIP (Ensuring Quality Information for Patients), Reliability Score, Global Quality Score (GQS), SMOG Readability Index, and Similarity Index. Mean±standard deviation values were calculated. Kruskal-Wallis and Dunn’s post-hoc tests were used to assess statistical differences. Results: MediSearch and OpenEvidence outperformed others in EQIP and Reliability scores. ChatGPT-4 generated the most original content with the lowest Similarity Index. SMOG readability was significantly better for ChatGPT-4, while MediSearch and OpenEvidence produced more technically complex language. Statistically significant differences (p<0.05) were found between platforms in EQIP, SMOG, and Similarity Index metrics. Conclusion: Performance among AI chatbots varies significantly in delivering orthodontic retention guidance. While medical-specific tools offer superior accuracy and reliability, general-purpose models like ChatGPT-4 excel in readability and originality. The results highlight the importance of matching AI platform selection with patient communication goals.
Aşkın et al. (Sun,) studied this question.