February 15, 2026Open Access

A Cognitive Risk Taxonomy for AI-Assisted Emergency ECG Interpretation: Mapping Algorithmic Error to Physician Bias Pathways

Structured PICO

Does AI-assisted ECG interpretation (GE 12SL algorithm) agree with expert cardiologist annotations for emergency-critical diagnoses, and what are the cognitive risks of algorithmic errors?

Population

21,799 dual-labelled ECGs from the PTB-XL+ dataset

Intervention

GE 12SL algorithm output (AI-assisted ECG interpretation)

Comparator

Expert cardiologist annotations

Outcome

Disagreement/discordance between expert annotations and AI output across 54 emergency-critical SNOMED CT concepts (including acute myocardial infarction, life-threatening arrhythmias, conduction blocks, and ischaemia)

AI ECG interpretation algorithms frequently disagree with expert cardiologists on critical diagnoses and may actively mislead physicians by confidently substituting incorrect labels for missed lethal conditions, projecting worse performance for the AI-physician combination than the physician alone.

Abstract

Background: Artificial intelligence algorithms for electrocardiogram (ECG) interpretation are now standard in emergency departments worldwide, yet the assumption that AI and physician errors are complementary — and therefore self-correcting — has never been systematically tested against the cognitive realities of emergency medicine practice. Objective: To develop a cognitive risk taxonomy that maps specific AI error patterns through specific physician bias pathways to specific clinical risk predictions for emergency-critical ECG diagnoses. Methods: We analysed 21,799 dual-labelled ECGs from the PTB-XL+ dataset, comparing expert cardiologist annotations against GE 12SL algorithm output across 54 emergency-critical SNOMED CT concepts spanning acute myocardial infarction, life-threatening arrhythmias, conduction blocks, and ischaemia. A five-stage disagreement analysis framework quantified error direction, magnitude, confidence profiles, compound co-occurrence patterns, and diagnostic substitution profiles. Each disagreement signature was mapped to cognitive bias pathways derived from dual-process theory and scored on a composite of frequency, clinical severity, and bias amplification potential. Results: Of 106,401 non-trivial comparisons, 93.7% were discordant (6.3% agreement), with 81% of the dataset carrying at least one emergency-critical disagreement (mean 5.7 per affected ECG). Eight named disagreement signatures were identified, organised into a two-tier, four-class taxonomy: Lethal Diagnosis Miss (anteroseptal MI blind spot, SVT/VT inversion, LBBB–STEMI mask), Mechanism Blindness (bradycardia inflation, fascicular desert), Signal Corruption (ischaemia noise floor, old MI avalanche), and Self-Undermining AI (QT alarm fatigue). Seven were classified as CRITICAL risk. The AI’s binary confidence architecture delivered 81% of overcalls at maximum confidence with no hedging, creating near-maximum anchoring potential for every error. Diagnostic substitution profiling revealed that AI misses are not silent omissions but active reframings: when the AI misses anteroseptal MI, it labels “Old MI” on 60.6% of those ECGs; when it misses ventricular tachycardia, it labels “SVT” on 45.8%. Conclusions: The Compound Risk Hypothesis was validated across all eight danger zones: in every case, the AI–physician combination was projected to perform worse than the physician alone through three mechanisms — Direct Suppression, Capability Destruction, and Environmental Contamination. A seven-component safeguard framework targeting class-specific bias pathways was developed, designed to be net-negative on alert burden. The taxonomy framework is system-agnostic and designed for reapplication to any AI system with dual-label validation data.

A Cognitive Risk Taxonomy for AI-Assisted Emergency ECG Interpretation: Mapping Algorithmic Error to Physician Bias Pathways

Structured PICO

Abstract

Cite This Study