March 3, 2026Open Access

Assessing Large Language Models for Early Article Identification in Otolaryngology—Head and Neck Surgery Systematic Reviews

Key Points

Outputs from large language models showed inaccuracies, yet some relevant articles were identified.
The evaluation indicated that recent articles were often overlooked in conventional peer-reviewed methodologies.
Systematic reviews employing a PRISMA framework serve as the gold standard for literature evaluation.
Refining large language models for this purpose may improve accuracy and efficiency in literature reviews.

Abstract

Large language models (LLMs) failed to fully replicate peer-reviewed methodologies, producing outputs with inaccuracies but identifying relevant, especially recent, articles missed by the references. While human-led PRISMA-based reviews remain the gold standard, refining LLMs for literature reviews shows potential.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Bakare et al. (Wed,) studied this question.

synapsesocial.com/papers/69a75cb8c6e9836116a25d1f https://doi.org/https://doi.org/10.1002/hcs2.70048

Bookmark

View Full Paper