Machine learning research frames edge cases as statistical outliers requiring adversarial stress testing. In medicine, this frame is a category error. Clinical complexity—atypical presentations, comorbidities, ambiguous symptoms, rare but catastrophic conditions—is not deviation from normal operating conditions. It is normal operating conditions. This paper argues that medical AI evaluation built on the ML edge-case model systematically misrepresents clinical reality, enabling a class of silent failures that are invisible to automated evaluation and non-expert reviewers but recognizable to clinicians. The most dangerous failures in medical AI are not dramatic; they are plausible, calm, and precisely timed to arrive when urgency is required. This analysis proposes a governance principle: no AI system may be evaluated on human-relevant edge cases without human clinical expertise governing the process. This work extends substrate governance and APR-Lite frameworks previously developed by the author for AI output governance in regulated industries (Soft Armor Labs, 2024–2026).
Narnaiezzsshaa Truong (Tue,) studied this question.