Background: Oral lesions are common clinical findings that frequently cause patient anxiety and prompt individuals to seek online information. Artificial intelligence (AI)-driven chatbots, such as ChatGPT, Google Gemini, and Microsoft Copilot, are increasingly utilized for immediate guidance; however, their reliability, accuracy, and safety in addressing oral lesion-related queries remain uncertain. Objective: This study aimed to evaluate and compare the performance of ChatGPT, Google Gemini, and Microsoft Copilot in responding to patient queries on oral lesions, with emphasis on accuracy, relevance, clarity, safety, transparency, and readability. Methods: Twenty patient-centered questions were curated from reputable health sources and public forums. Each question was entered into the three chatbots under standardized conditions. Four calibrated observers independently rated the responses using a structured five-point Likert scale. Readability was analyzed using the Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKGL) indices. Results: Google Gemini and ChatGPT outperformed Microsoft Copilot, with significant differences observed in accuracy ( P = 0.022) and safety ( P < 0.001). Inter-rater agreement was highest for Copilot (κ ≈ 0.8), while ChatGPT demonstrated the best readability (FKGL = 6.58, FRE = 59.64). Conclusion: ChatGPT and Google Gemini demonstrated superior performance compared to Microsoft Copilot. While ChatGPT offered more readable responses, Gemini provided more comprehensive but complex content. Continuous refinement and domain-specific training are essential to enhance their clinical reliability and ensure patient safety.
Balajee et al. (Thu,) studied this question.