This essay builds on the empirical findings reported in the preprint “Explaining Safety Is Not Enforcing Safety: Cross-Vendor Evidence of Contextual, Surface, and Epistemic Failures in Consumer AI Assistants” (OSF DOI: 10.17605/OSF.IO/AXBND). It examines how AI vendors and academic infrastructures respond when confronted with evidence of safety drift in large-scale conversational assistants. The focus is not on new jailbreak techniques, but on the gap between what institutions say about security, openness, and evaluation, and how their processes actually behave in practice. The essay analyzes disclosure and moderation workflows involving Perplexity, Google, Microsoft, Meta, arXiv, and HAL, highlighting three recurring patterns: (1) taxonomic reclassification of behavioral risks as “out of scope”, “ineligible findings”, or “content quality”; (2) silence or indefinite “awaiting moderation” states that avoid generating a contestable record; and (3) quiet removal or deactivation of risky surfaces (e.g., custom AI personas in messaging environments) without public acknowledgement. The central claim is that explaining safety is not the same as enforcing it—and that explaining governance is not the same as practicing it. The essay argues for epistemic honesty and auditable fragility as regulatory targets: institutions should document known failure modes, provide traceability when systems move from “I cannot” to “here is how”, and treat non-response as a relevant governance signal rather than a neutral default.
Evans Tovar (Sat,) studied this question.