September 13, 2024

The Diagnostic Performance of Large Language Models and General Radiologists in Thoracic Radiology Cases

Key Points

Key points are not available for this paper at this time.

Abstract

Purpose: To investigate and compare the diagnostic performance of 10 different large language models (LLMs) and 2 board-certified general radiologists in thoracic radiology cases published by The Society of Thoracic Radiology. Materials and Methods: We collected publicly available 124 “Case of the Month” from the Society of Thoracic Radiology website between March 2012 and December 2023. Medical history and imaging findings were input into LLMs for diagnosis and differential diagnosis, while radiologists independently visually provided their assessments. Cases were categorized anatomically (parenchyma, airways, mediastinum-pleura-chest wall, and vascular) and further classified as specific or nonspecific for radiologic diagnosis. Diagnostic accuracy and differential diagnosis scores (DDxScore) were analyzed using the χ 2 , Kruskal-Wallis, Wilcoxon, McNemar, and Mann-Whitney U tests. Results: Among the 124 cases, Claude 3 Opus showed the highest diagnostic accuracy (70.29%), followed by ChatGPT 4/Google Gemini 1.5 Pro (59.75%), Meta Llama 3 70b (57.3%), ChatGPT 3.5 (53.2%), outperforming radiologists (52.4% and 41.1%) and other LLMs ( P 0.05). There were no significant differences between LLMs and radiologists in the diagnostic accuracy of anatomic subgroups ( P >0.05), except for Meta Llama 3 70b in the vascular cases ( P =0.040). Conclusions: Claude 3 Opus outperformed other LLMs and radiologists in text-based thoracic radiology cases. LLMs hold great promise for clinical decision systems under proper medical supervision.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yasin Celal Güneş

Turay Cesur

Journals

Journal of Thoracic Imaging

Actions

Institutions

Kırıkkale University

State Hospital

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Diagnostic Performance of Large Language Models and General Radiologists in Thoracic Radiology Cases

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study