December 15, 2025Open Access

Performance of five free large language models in dental trauma: a 30-day longitudinal benchmark study

Key Points

Key points are not available for this paper at this time.

Abstract

Objective: To compare the accuracy and consistency of five large language models (LLMs) in generating responses about dental trauma. Materials and methods: = 0.05), alongside calculation of sensitivity, specificity, accuracy, and area under the ROC curve (AUC) based on the 60-item set. Temporal stability was assessed using the intraclass correlation coefficient ICC. Results: 0.90). Conclusion: All evaluated LLMs, particularly Copilot and DeepSeek, demonstrated high accuracy in providing information on dental trauma, with stable performance over time. While the use of a context prompt did not significantly affect accuracy or stability.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Rafaela Mancini Lisboa

Arian Braido

Adriana de Jesus Soares

Journals

Frontiers in Oral Health

Actions

Institutions

Universidade Estadual de Campinas (UNICAMP)

All India Institute of Medical Sciences

Universidade Federal de Uberlândia

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Performance of five free large language models in dental trauma: a 30-day longitudinal benchmark study

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study