Los puntos clave no están disponibles para este artículo en este momento.
BACKGROUND Artificial Intelligence advancements have enabled Large Language Models to significantly impact radiology education and diagnostic accuracy. OBJECTIVE This study evaluates the performance of mainstream Large Language Models, including GPT-4, Claude, Bard, Tongyi Qianwen, and Gemini Pro, in radiology board exams. METHODS A comparative analysis of 150 multiple-choice questions from radiology board exams without images was conducted. Models were assessed on accuracy in text-based questions categorized by cognitive levels and medical specialties using chi-square tests and ANOVA. RESULTS GPT-4 achieved the highest accuracy (83.3%), significantly outperforming others. Tongyi Qianwen also performed well (70.7%). Performance varied across question types and specialties, with GPT-4 excelling in both lower-order and higher-order questions, while Claude and Bard struggled with complex diagnostic questions. CONCLUSIONS GPT-4 and Tongyi Qianwen show promise in medical education and training. The study emphasizes the need for domain-specific training datasets to enhance large models' effectiveness in specialized fields like radiology.
Boxiong Wei (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: