Artificial intelligence (AI) language models are increasingly integrated into clinical and patient-centered information pathways, yet their accuracy in delivering condition-specific dental knowledge remains unclear. This comparative study evaluated the clinical accuracy of 3 widely used AI models—ChatGPT-4, Gemini, and Copilot—in providing information on impacted teeth. A total of 118 expert-generated open-ended questions were posed to each model, and responses were categorized into 5 predefined accuracy levels. Statistical analysis using the Pearson χ 2 or Fisher exact test ( P ≤0.05) demonstrated that ChatGPT-4 produced the highest proportion of “Objectively True” responses (83.9%) and consistently outperformed Gemini and Copilot across all domains, including definitions, indications, procedural descriptions, contraindications, and complications. Gemini and Copilot more frequently generated incomplete or selectively accurate answers classified as “Selected Facts” or “Minimal Facts,” highlighting variability in their informational reliability. Overall, ChatGPT-4 exhibited superior clinical accuracy and appears to function as a more dependable supplementary resource for impacted tooth–related information, whereas the inconsistent performance of Gemini and Copilot underscores the continued need for expert oversight in patient education and clinical communication.
Zi̇nci̇r et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: