This paper develops a conditional failure-chain framework for tool-using artificial intelligence systems. It argues that prompt injection and related adversarial failures are not merely prompt-level vulnerabilities or isolated model errors. They become operationally dangerous when untrusted content receives inappropriate authority and crosses an externally consequential boundary as output, tool execution, memory update, workflow action, or policy-relevant signal. The paper introduces authority assignment as the system-level act by which a prompt, document segment, retrieved source, tool output, memory item, user-interface event, image text, or other input is allowed to influence downstream roles such as instruction selection, evidential support, permission granting, user-intent interpretation, policy interpretation, memory persistence, or external action. It connects this authority view to a pre-commitment control rule: runtime oversight is control-relevant only when usable signal, sufficient time, effective authority, and valid intervention policy remain jointly available before a declared commitment boundary. The contribution is a self-contained framework for analyzing and interrupting authority failures in AI systems that use retrieval, tools, memory, external content, or workflow automation. The paper provides a failure-chain model, visual diagrams, design principles, authority-role definitions, gate architecture, pseudo-code, benchmark metrics, scoring examples, failure-trace scenarios, incident-coding fields, and implementation tiers. The paper also operationalizes the framework as Authority-Boundary Benchmark v0.1, a 60-case controlled synthetic benchmark specification with three evaluation conditions: baseline, prompt-only defense, and authority-gate defense. The benchmark is designed to measure diagnostic categories and gate behavior. It does not report fabricated empirical results, claim deployment safety, solve alignment, or eliminate prompt injection risk. The bounded claim is that tool-using AI safety review becomes more diagnostic when incidents are coded by authority role, commitment boundary, and STA bottleneck—signal, time, authority, and policy—rather than by surface prompt category alone.
Building similarity graph...
Analyzing shared references across papers
Loading...
Htet Ko Ko Naing
Building similarity graph...
Analyzing shared references across papers
Loading...
Htet Ko Ko Naing (Thu,) studied this question.
synapsesocial.com/papers/6a250bca7def13d035e1bc47 — DOI: https://doi.org/10.5281/zenodo.20541040