Performance of Large Language Models on Diagnostic Radiology Board–Style Questions: A Comparative Evaluation of GPT-4o, Perplexity AI, and OpenEvidence
Key Points
The study aims to evaluate the performance of emerging language models in diagnostic radiology scenarios.
Comparative evaluation of GPT-4o, Perplexity AI, and OpenEvidence on diagnostic questions.
Utilized board-style questions relevant to diagnostic radiology to assess accuracy and reliability.
Emerging LLMs like Perplexity AI and OpenEvidence demonstrated higher diagnostic reliability than traditional models.
Performance metrics indicate improved accuracy in interpreting radiology questions.
Abstract
Emerging LLMs such as Perplexity AI and OpenEvidence may offer greater diagnostic reliability than general-purpose models in radiology-specific contexts.
Like
Bookmark
Share
Like
Bookmark
Share
Performance of Large Language Models on Diagnostic Radiology Board–Style Questions: A Comparative Evaluation of GPT-4o, Perplexity AI, and OpenEvidence | Synapse