December 15, 2025Open Access

Performance of five free large language models in dental trauma: a 30-day longitudinal benchmark study

Key Points

Key points are not available for this paper at this time.

Abstract

Objective: To compare the accuracy and consistency of five large language models (LLMs) in generating responses about dental trauma. Materials and methods: = 0.05), alongside calculation of sensitivity, specificity, accuracy, and area under the ROC curve (AUC) based on the 60-item set. Temporal stability was assessed using the intraclass correlation coefficient ICC. Results: 0.90). Conclusion: All evaluated LLMs, particularly Copilot and DeepSeek, demonstrated high accuracy in providing information on dental trauma, with stable performance over time. While the use of a context prompt did not significantly affect accuracy or stability.

Bookmark

View Full Paper

Cite This Study

Lisboa et al. (Mon,) studied this question.

synapsesocial.com/papers/6a0889d0df3db87398109ea3 https://doi.org/https://doi.org/10.3389/froh.2025.1737114

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper