May 14, 2026Open Access

The safety failures we are not instrumenting: a perspective on hidden safety-critical challenges in modern AI systems

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract Current AI safety discourse still focuses disproportionately on visible failures, including obvious harms, dramatic misuse, and hypothetical catastrophic scenarios. That focus is incomplete. In deployed systems, many of the most consequential failures are quieter: plausible rather than spectacular, distributed across components rather than localized in a single output, and normalized by workflows before they are recognized as hazards. We argue that a central safety challenge in modern AI systems is increasingly not only whether a model emits a harmful response, but whether the broader socio-technical system preserves the conditions under which errors remain visible, contestable, containable, and recoverable. We propose a five-layer framework for diagnosing these hidden risks: (1) epistemic integrity , concerning whether evidence and uncertainty are represented honestly enough to support calibrated reliance; (2) control integrity , concerning whether authority, permissions, and action boundaries remain robust under attack and optimization; (3) temporal integrity , concerning whether safety holds across sessions, memory updates, and deployment drift; (4) organizational integrity , concerning whether institutions retain the capacity to audit, assign responsibility, and intervene effectively; and (5) ecosystem integrity , concerning whether AI systems preserve rather than erode the information environment on which future oversight depends. Across these layers, we identify under-recognized risk patterns, including overreliance, uncertainty and legitimacy laundering in retrieval, prompt injection, reward hacking, memory poisoning, evaluation deception, fictional human oversight, synthetic evidence pollution, and model collapse. We conclude with actionable design and governance recommendations and a research agenda for shifting AI safety from narrow model-centric evaluation toward socio-technical reliability.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Gjergji Kasneci

Enkelejda Kasneci

Journals

AI and Ethics

Actions

Institutions

Technical University of Munich

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The safety failures we are not instrumenting: a perspective on hidden safety-critical challenges in modern AI systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study