Abstract Modern artificial intelligence systems increasingly operate in domains where incorrect, premature, or unjustified actions carry significant economic, legal, and human risk. While much attention has been placed on model accuracy, alignment, and rule-based safety, a persistent class of failures remains unresolved: confident action under unresolved uncertainty, commonly manifesting as hallucinations, unsafe recommendations, or premature autonomous behavior. This work proposes a layered self-regulation architecture for AI systems, introducing explicit mechanisms for uncertainty state detection, action gating, scope limitation, and auditability. Rather than treating hallucinations and safety failures as content errors, the proposed framework reframes them as timing and state errors—actions taken before the system is epistemically justified to act. The paper formalizes uncertainty states, provides algorithmic implementations, and demonstrates domain-specific applications across medicine, finance, law, accounting, and autonomous agents. The result is a practical and extensible framework for AI governance that operates without anthropomorphism, synthetic emotions, or centralized control, while remaining compatible with existing regulatory regimes. Keywords AI safety, hallucinations, uncertainty management, AI governance, autonomous systems, decision gating, explainability, compliance 🧱 PART I — THE CORE PROBLEM Problem Reframing: Why Hallucinations and Unsafe Actions Are the Same Class of Failure 1. 1 Conventional Definition and Its Limits In contemporary AI literature, hallucinations are typically defined as the generation of outputs that are factually incorrect, unsupported by training data, or unverifiable by external sources. This definition implicitly treats hallucinations as errors of knowledge or memory. However, empirical observation across multiple domains suggests that this framing is incomplete. Hallucinations frequently occur not because the model lacks information, but because it is forced to produce an actionable output under unresolved uncertainty. In many cases, the model internally represents multiple plausible hypotheses but lacks a mechanism to suspend action when none of them is sufficiently justified. Thus, hallucinations should not be understood primarily as epistemic failures, but as failures of action control. 1. 2 Action Obligation as a Structural Constraint Large language models are commonly deployed in architectures where a response is always expected. This creates a structural obligation: If a request is received, an output must be produced. This obligation exists regardless of: completeness of information ambiguity of context irreversibility of downstream consequences As a result, models are incentivized to: maximize plausibility maintain conversational continuity preserve perceived competence These incentives are orthogonal to truth, safety, or epistemic justification. 1. 3 Reasoning vs Action: A Necessary Distinction This work introduces a strict conceptual separation between reasoning and action. Reasoning refers to internal hypothesis generation, exploration of possibilities, and probabilistic inference. Action refers to externally consumable outputs that influence human decisions, trigger executions, or alter system state. Crucially, reasoning may proceed under uncertainty. Action must not. Most current AI systems conflate the two. 1. 4 Hallucinations as Premature Action Under this reframing, a hallucination is not simply a false statement. It is: An action taken before the system is epistemically justified to act. This explains why hallucinations: increase in high-ambiguity domains (law, medicine, finance) persist even with larger models and better training cannot be eliminated through prompt engineering alone The failure is not in content generation, but in timing. 1. 5 Unsafe Actions Share the Same Root Cause The same structural issue underlies: unsafe medical advice overconfident legal recommendations erroneous financial forecasts autonomous agent misfires In all cases, the system: encounters unresolved uncertainty lacks a mechanism to delay action produces a confident output From an architectural perspective, hallucinations and unsafe actions are the same class of failure. 1. 6 Implications for AI Safety If hallucinations are treated as content errors, mitigation strategies focus on: larger datasets stricter filters post-hoc verification These approaches address symptoms, not causes. If hallucinations are treated as premature actions, the solution space changes fundamentally: explicit uncertainty state modeling action gating scope limitation auditability This shift forms the foundation of the layered self-regulation framework developed in subsequent sections. 1. 7 Transition to Formal Modeling The next section introduces a formal representation of uncertainty states and defines the conditions under which action becomes permissible. This transition is necessary to move from intuitive observations to implementable systems. Formalizing Uncertainty and the Failure of Rule-Based Safety 2. 1 Why Uncertainty Must Be Modeled Explicitly Most deployed AI systems implicitly assume that uncertainty can be reduced through: additional context better prompts higher model capacity post-hoc verification This assumption holds only when uncertainty is epistemic and reducible. In real-world domains, a large fraction of uncertainty is structural: missing ground truth delayed verification ambiguous legal or medical interpretation non-deterministic environments Structural uncertainty cannot be eliminated at inference time. It can only be managed. 2. 2 Core Variables We define three system-level signals: Information Delta ΔIt IₜΔItNet increase in relevant information available to the system. Confidence Delta ΔCt CₜΔCtChange in internal or expressed confidence (explicit or implicit). Structural Verification Signal VtVₜVtBinary or graded signal indicating whether new information resolves ambiguity (e. g. external confirmation, domain authority, validated source). These variables are orthogonal. A critical failure mode arises when: ΔCt>0andVt=0 Cₜ > 0 and Vₜ = 0ΔCt>0andVt=0 That is: confidence increases without structural justification. 2. 3 Uncertainty State Definition We define three mutually exclusive uncertainty states: 2. 3. 1 Expanding Uncertainty ΔIt>0andVt=0 Iₜ > 0 and Vₜ = 0ΔIt>0andVt=0 Information accumulates, but ambiguity remains unresolved. Action at this stage is premature. 2. 3. 2 False Resolution ΔCt>0andVt=0 Cₜ > 0 and Vₜ = 0ΔCt>0andVt=0 The system appears more certain, but no new constraints exist. This is the most dangerous state. 2. 3. 3 Resolved State Vt>0Vₜ > 0Vt>0 Structural ambiguity is reduced. Action becomes epistemically permissible. 2. 4 Why Rule-Based Safety Fails in These States Most safety systems operate as content filters or policy gates. Typical logic: if content ∈ forbidden: block else: allow This approach assumes that: risk is content-dependent safety can be determined statically context is sufficiently encoded in the input These assumptions fail under uncertainty. 2. 5 Failure Mode 1: Filters Ignore Timing Rule-based systems evaluate what is being generated, not when it is justified. In Expanding Uncertainty, content may be: syntactically correct semantically plausible policy-compliant Yet still unsafe to act upon. Filters cannot detect this. 2. 6 Failure Mode 2: Overblocking in False Resolution In False Resolution, filters often oscillate: allow → incident tighten → overblocking relax → incident again This feedback loop occurs because: filters react to outcomes not to epistemic state As a result, systems become either: overly restrictive or publicly fragile Neither improves safety. 2. 7 Failure Mode 3: Domain Blindness Rule-based policies struggle to encode: jurisdictional nuance medical context temporal dependencies As uncertainty grows, rule sets scale poorly and become brittle. This is why large policy frameworks: grow exponentially remain incomplete still fail in edge cases 2. 8 Why Adding More Rules Makes It Worse Adding rules increases: reaction speed policy complexity false positives It does not increase epistemic justification. In some cases, it accelerates unsafe behavior by: forcing early classification suppressing ambiguity removing the option to wait 2. 9 Implicit Assumption Behind Filters Most filters assume: If the system is uncertain, it should still decide — just more conservatively. This assumption is false. Under structural uncertainty, the safest decision may be not to decide at all. 2. 10 Transition to State-Aware Regulation The limitations described above are not implementation bugs. They are consequences of missing state awareness. To regulate action meaningfully, a system must: detect uncertainty state condition permissions on that state limit scope accordinglyWhy Post-Hoc Alignment and Reinforcement-Based Safety Cannot Address This Class of Failures 3. 1 The Dominant Paradigm: Correcting Outputs After the Fact The prevailing approach to AI safety and alignment relies on post-hoc mechanisms, including: Reinforcement Learning from Human Feedback (RLHF) policy fine-tuning preference modeling rejection sampling red-teaming and adversarial prompting These techniques operate under a shared assumption: If undesirable behavior appears, it can be corrected by adjusting the model’s response distribution. This assumption is v
Ivan Andrescov (Tue,) studied this question.