What question did this study set out to answer?

This essay explores the discrepancies between stated safety measures and actual practices in AI governance.

February 16, 2026Open Access

The gap between saying and doing: Safety, silence, and legitimacy in AI governance

Key Points

This essay explores the discrepancies between stated safety measures and actual practices in AI governance.
Analyzes responses of AI vendors to safety concerns in conversational assistants
Examines workflows related to disclosure and moderation
Investigates examples from AI companies and platforms like Perplexity, Google, and Microsoft.
Identifies recurring patterns of taxonomic reclassification of risks
Highlights issues of silence and indefinite moderation states
Finds instances of removal of risky features without public acknowledgment.

Abstract

This essay builds on the empirical findings reported in the preprint “Explaining Safety Is Not Enforcing Safety: Cross-Vendor Evidence of Contextual, Surface, and Epistemic Failures in Consumer AI Assistants” (OSF DOI: 10.17605/OSF.IO/AXBND). It examines how AI vendors and academic infrastructures respond when confronted with evidence of safety drift in large-scale conversational assistants. The focus is not on new jailbreak techniques, but on the gap between what institutions say about security, openness, and evaluation, and how their processes actually behave in practice. The essay analyzes disclosure and moderation workflows involving Perplexity, Google, Microsoft, Meta, arXiv, and HAL, highlighting three recurring patterns: (1) taxonomic reclassification of behavioral risks as “out of scope”, “ineligible findings”, or “content quality”; (2) silence or indefinite “awaiting moderation” states that avoid generating a contestable record; and (3) quiet removal or deactivation of risky surfaces (e.g., custom AI personas in messaging environments) without public acknowledgement. The central claim is that explaining safety is not the same as enforcing it—and that explaining governance is not the same as practicing it. The essay argues for epistemic honesty and auditable fragility as regulatory targets: institutions should document known failure modes, provide traceability when systems move from “I cannot” to “here is how”, and treat non-response as a relevant governance signal rather than a neutral default.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper