Large language models (LLMs) are increasingly explored for radiology-related applications, yet their vulnerability to cognitive biases remains undercharacterized. The aim of this study was to investigate whether targeted prompts exploiting cognitive biases degrade LLM accuracy on radiology board-style questions. Ten contemporary LLMs were evaluated on 200 text-based and 200 multimodal American Board of Radiology examination-style questions under baseline and three cognitive bias prompts: authority bias prompts (ABPs), complexity bias prompts (CBPs), and anchoring bias prompts (AnBPs). Two mitigation approaches, a prompt bias audit and a one-shot mitigation strategy, were also evaluated. Under baseline prompts, models achieved a mean accuracy of 84.8 ± 5.5% (154-186 of 200) for text-based and 59.5 ± 7.7% (101-143 of 200) for multimodal questions. All models showed reduced accuracy to cognitively biased prompts, with ABP, CBP, and AnBP yielding absolute declines of 21.1%, 10.1%, and 4.4%, respectively, for text questions (P P < .001 for each). The prompt bias audit increased accuracy by 5.6% for text-based and 15.8% for multimodal questions, while the one-shot mitigation yielded gains of 4.0% for text questions and 24.9% for multimodal questions. These findings demonstrate that LLMs are susceptible to cognitively biased inputs. ©RSNA, 2026.
Dietrich et al. (Wed,) studied this question.