What question did this study set out to answer?

The study aims to examine how cognitive biases in prompts affect the accuracy of large language models on radiology exam questions.

April 17, 2026

Cognitively Biased Prompt Effects on Large Language Model Accuracy for Radiology Board-Style Examination Questions

Key Points

The study aims to examine how cognitive biases in prompts affect the accuracy of large language models on radiology exam questions.
Evaluated ten large language models on 400 examination-style questions.
Used three cognitive bias prompts: authority bias, complexity bias, and anchoring bias.
Implemented two mitigation strategies: a prompt bias audit and a one-shot mitigation approach.
Baseline accuracy: 84.8% for text-based and 59.5% for multimodal questions.
Cognitive bias prompts resulted in accuracy declines of up to 44.9% for multimodal questions.
Mitigation strategies improved accuracy by up to 24.9% for multimodal questions.

Abstract

Large language models (LLMs) are increasingly explored for radiology-related applications, yet their vulnerability to cognitive biases remains undercharacterized. The aim of this study was to investigate whether targeted prompts exploiting cognitive biases degrade LLM accuracy on radiology board-style questions. Ten contemporary LLMs were evaluated on 200 text-based and 200 multimodal American Board of Radiology examination-style questions under baseline and three cognitive bias prompts: authority bias prompts (ABPs), complexity bias prompts (CBPs), and anchoring bias prompts (AnBPs). Two mitigation approaches, a prompt bias audit and a one-shot mitigation strategy, were also evaluated. Under baseline prompts, models achieved a mean accuracy of 84.8 ± 5.5% (154-186 of 200) for text-based and 59.5 ± 7.7% (101-143 of 200) for multimodal questions. All models showed reduced accuracy to cognitively biased prompts, with ABP, CBP, and AnBP yielding absolute declines of 21.1%, 10.1%, and 4.4%, respectively, for text questions (P P < .001 for each). The prompt bias audit increased accuracy by 5.6% for text-based and 15.8% for multimodal questions, while the one-shot mitigation yielded gains of 4.0% for text questions and 24.9% for multimodal questions. These findings demonstrate that LLMs are susceptible to cognitively biased inputs. ©RSNA, 2026.

Bookmark

Cognitively Biased Prompt Effects on Large Language Model Accuracy for Radiology Board-Style Examination Questions

Key Points

Abstract

Cite This Study