Key points are not available for this paper at this time.
Objective: To compare the accuracy and consistency of five large language models (LLMs) in generating responses about dental trauma. Materials and methods: = 0.05), alongside calculation of sensitivity, specificity, accuracy, and area under the ROC curve (AUC) based on the 60-item set. Temporal stability was assessed using the intraclass correlation coefficient ICC. Results: 0.90). Conclusion: All evaluated LLMs, particularly Copilot and DeepSeek, demonstrated high accuracy in providing information on dental trauma, with stable performance over time. While the use of a context prompt did not significantly affect accuracy or stability.
Lisboa et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: