What does this research mean for the field?

Multimodal large language models (LLMs) significantly underperform compared to neuroradiologists in interpreting neuroradiology cases and exhibit poor reasoning for differential diagnoses, especially when clinical history is not provided. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The research aims to assess how accurately multimodal large language models interpret neuroradiology cases compared to experts.

February 27, 2026Open Access

Evaluating the Accuracy and Diagnostic Reasoning of Multimodal Large Language Models in Interpreting Neuroradiology Cases From RadioGraphics

Key Points

The research aims to assess how accurately multimodal large language models interpret neuroradiology cases compared to experts.
Compared performance of multimodal LLMs and neuroradiologists in interpreting neuroradiology cases.
Analyzed reasoning for differential diagnoses provided by LLMs.
Evaluated impact of absence of clinical history on LLM performance.
Multimodal LLMs significantly underperformed compared to neuroradiologists.
LLMs exhibited unsatisfactory reasoning for differential diagnoses.
Performance declined further for cases lacking textual input of clinical history.

Abstract

LLMs remarkably underperformed compared with neuroradiologists and showed unsatisfactory reasoning for their differential diagnoses, with performance declining further in cases without textual input of clinical history. These findings highlight the limitations of current multimodal LLMs in neuroradiological interpretation and their reliance on text input.

Bookmark

View Full Paper

Bookmark

View Full Paper

Evaluating the Accuracy and Diagnostic Reasoning of Multimodal Large Language Models in Interpreting Neuroradiology Cases From RadioGraphics

Key Points

Abstract

Cite This Study