What question did this study set out to answer?

This research aims to determine if diagnostic expectations embedded in prompts affect LLM classification of LVEF from echocardiography reports and test mitigation strategies.

June 11, 2026Open Access

Prompt-Induced Diagnostic Bias in Large Language Model Classification of Echocardiography Reports

Key Points

This research aims to determine if diagnostic expectations embedded in prompts affect LLM classification of LVEF from echocardiography reports and test mitigation strategies.
Evaluated GPT-5 under 1 baseline and 4 prompt conditions (3 bias-injected, 1 mitigation).
Analyzed 1,500 echocardiography reports from a single institution dataset.
Classifications were compared to assess the impact of bias on LVEF results.
Bias prompts altered LVEF classifications in all 3 classes, with shifts aligning with the referenced category.
Instruction-based mitigation restored accuracy near baseline levels in all 3 biased conditions.
Prompt-induced bias significantly risks diagnostic classification accuracy in LMEF assessments.

Abstract

This study evaluated whether embedding diagnostic expectations within prompts biases large language model (LLM) classification of left ventricular ejection fraction (LVEF) from echocardiography reports, and whether a prompt-level mitigation strategy can counteract this effect. GPT-5 was evaluated under 1 baseline, 3 bias-injected, and 1 explicit instruction-based mitigation prompt condition across 1,500 structured reports from a single institution dataset. Bias prompts significantly altered LVEF classifications across all 3 classes, with shifts directionally consistent with the referenced category. The instruction-based mitigation strategy restored overall accuracy near baseline in all 3 conditions. In conclusion, prompt-induced bias poses a meaningful risk to diagnostic classification accuracy; however, prompt-level safeguards may support reliable LLM deployment in clinical settings.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper