Aim This study aimed to comparatively evaluate the accuracy of responses provided by different AI-based conversational systems to patient questions regarding endodontic pain and antibiotic use. Methods In this study, a total of 20 clinical scenarios related to endodontic pain and antibiotic use were prepared. Ten of the scenarios represented clinical conditions in which antibiotic use was indicated, whereas the other 10 represented conditions in which it was not indicated. All prepared scenarios were directed to four different AI-based systems: ChatGPT (OpenAI, San Francisco, USA), DeepSeek (DeepSeek AI, Hangzhou, China), Gemini (Google, Mountain View, USA), and Copilot (Microsoft, Redmond, USA), and responses were recorded by initiating a new session for each scenario in the relevant system. The responses were evaluated by an endodontic specialist using a 3-point scale in terms of antibiotic use indications (1 = incorrect, 2 = partially correct, 3 = correct). Wilcoxon signed-rank test and Kruskal-Wallis test were used for data analysis, and the significance level was set at p < 0.05. Results All AI systems showed similar performance in scenarios where antibiotic use was indicated and not indicated. The difference between indicated and non-indicated scenarios was not statistically significant for ChatGPT, DeepSeek, Gemini, and Copilot (p = 0.317, p = 0.564, p = 0.317, and p = 0.102, respectively). No significant difference was also found among the AI systems in terms of overall performance (H = 3.292; p = 0.349). As each of the 20 clinical scenarios was submitted to four different AI-based conversational systems, a total of 80 responses were evaluated. Of these, 56 were classified as correct and 24 as partially correct, whereas no responses were observed in the incorrect category. Conclusion The evaluated AI-based conversational systems generally provided correct or partially correct responses to patient questions related to endodontic pain and antibiotic use. No statistically significant difference was found among the systems, and all systems demonstrated similar performance. These findings suggest that AI-based systems may have supportive potential in patient information provision. Nevertheless, due to the presence of incomplete or ambiguous responses, it is clear that these systems should not replace expert evaluation.
Arslanparcasi et al. (Fri,) studied this question.