Key points are not available for this paper at this time.
Abstract Current AI safety discourse still focuses disproportionately on visible failures, including obvious harms, dramatic misuse, and hypothetical catastrophic scenarios. That focus is incomplete. In deployed systems, many of the most consequential failures are quieter: plausible rather than spectacular, distributed across components rather than localized in a single output, and normalized by workflows before they are recognized as hazards. We argue that a central safety challenge in modern AI systems is increasingly not only whether a model emits a harmful response, but whether the broader socio-technical system preserves the conditions under which errors remain visible, contestable, containable, and recoverable. We propose a five-layer framework for diagnosing these hidden risks: (1) epistemic integrity , concerning whether evidence and uncertainty are represented honestly enough to support calibrated reliance; (2) control integrity , concerning whether authority, permissions, and action boundaries remain robust under attack and optimization; (3) temporal integrity , concerning whether safety holds across sessions, memory updates, and deployment drift; (4) organizational integrity , concerning whether institutions retain the capacity to audit, assign responsibility, and intervene effectively; and (5) ecosystem integrity , concerning whether AI systems preserve rather than erode the information environment on which future oversight depends. Across these layers, we identify under-recognized risk patterns, including overreliance, uncertainty and legitimacy laundering in retrieval, prompt injection, reward hacking, memory poisoning, evaluation deception, fictional human oversight, synthetic evidence pollution, and model collapse. We conclude with actionable design and governance recommendations and a research agenda for shifting AI safety from narrow model-centric evaluation toward socio-technical reliability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Gjergji Kasneci
Enkelejda Kasneci
AI and Ethics
Technical University of Munich
Building similarity graph...
Analyzing shared references across papers
Loading...
Kasneci et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6a05680ea550a87e60a205a7 — DOI: https://doi.org/10.1007/s43681-026-01132-0