This study aimed to evaluate the performance of three large language models (LLMs)-ChatGPT-4.0, Claude 3.5 Sonnet, and DeepSeek R1-in answering multiple-choice questions (MCQs) related to pediatric dentistry. Accuracy and justification quality were analyzed using Bloom's taxonomy.
Mukhopadhyay et al. (Thu,) studied this question.