Large language models (LLMs) are increasingly used to provide health-related information, yet their clinical reliability remains uncertain. This study aimed to evaluate the accuracy, comprehensiveness, and readability of AI-based LLMs in providing evidence-based guidance on early maxillary expansion for children in the primary and mixed dentition phases, assessing their potential as trustworthy resources for parental decision-making. Eight LLMs (DeepSeek V3, Gemini 2.5 Flash, Claude 4.5 Sonnet, MediSearch, Copilot, GPT-5, GPT-4o, and Grok) were tasked with responding to a total of 20 questions reflecting common parental concerns about early maxillary expansion, with 10 questions assigned to each dentition phase (primary and mixed). Responses were evaluated for accuracy and comprehensiveness, and readability was assessed using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Statistical analyses included descriptive statistics and appropriate parametric and non-parametric tests based on data distribution, with significance set at p < 0.05. Significant differences were observed among the LLMs in terms of accuracy, comprehensiveness, and readability (p < 0.001). DeepSeek V3 and Grok achieved the highest scores for both accuracy and comprehensiveness across both dentition phases, with DeepSeek V3 also demonstrating the highest readability in the mixed dentition phase. Copilot, GPT-5, and GPT-4o produced the most readable outputs, as indicated by their highest FRES and lowest FKGL scores, though their content accuracy was comparatively lower. In contrast, MediSearch, Gemini 2.5 Flash, and Claude 4.5 Sonnet showed consistently weaker performance across all evaluated criteria. This study concluded that, although some LLMs can offer reliable and understandable information about early maxillary expansion, none consistently excel across all evaluated criteria. In healthcare contexts, integrating scientific accuracy with readability is crucial for supporting informed parental decision-making, enhancing overall health literacy, and strengthening patient–clinician communication. These findings highlight that AI-based LLMs should serve as supplementary tools that support, rather than replace, professional orthodontic guidance.
Çiçek et al. (Sat,) studied this question.