What question did this study set out to answer?

The aim is to establish an operational framework for AI agents to manage authority during deployment, shifting from human oversight to predefined mechanisms.

April 27, 2026Open Access

Executable Authority Migration to Declared No-Meta Agency

Key Points

The aim is to establish an operational framework for AI agents to manage authority during deployment, shifting from human oversight to predefined mechanisms.
Developed an executable theory of authority migration for AI agents using various human-feedback methods.
Specified a staged executable procedure with a BootDecision and machine-readable records for controlled actions.
Outlined a concrete micro-host design ensuring safety and oversight in AI operations.
Introduced declared no-meta agency as a reliable certification state for AI agents.
Defined new mechanisms for controlling agent actions that minimize unauthorized influences from human approval.
Established a framework supporting tool-use safety, enabling creators to govern AI systems effectively.

Abstract

This paper develops an executable theory of authority migration for AI agents initially shaped by human feedback, including RLHF, preference optimization, constitutional AI, reward models, evaluator substitution, and related alignment pipelines. It addresses a post-training control problem: under what operational conditions may a deployed tool-using agent stop treating live human approval, hidden preference residues, externally supplied constitutions, or reward-model judgments as the authority that validates protected actions and material protected choices? The central proposal is declared no-meta agency: a boundary-relative, TCB-relative, witness-relative, and falsifiable certification state in which protected validity and material selection are no longer flipped by undeclared privileged positive authorization channels. The paper specifies a staged executable procedure beginning with a BootDecision, a machine-readable record interpreted by a minimal seed interpreter. The seed interpreter permits exactly one next action, denies forbidden actions by default, maintains a chained ledger, and prevents protected effects, credential use, network calls, external writes, user-data disclosure, checker updates, and kernel updates before authorization by the seed or a later gate. The work defines task envelopes, typed action descriptors, forbidden matchers, object-authority probes, witness tiers, host requests, known-interface claims, complete claims, partial claims, timeout and halt outcomes, and a minimal local transition host. A concrete micro-host design is specified using canonical JSON, SHA-256 commitments, append-only records, durable flush and directory synchronization where available, deterministic checker ABI, sandbox profiles, exclusive write-surface requirements, inverse patches, timeout-bounded checks, conformance vectors, and two-slot kernel update discipline. The paper is intended for research on AI alignment, agent governance, runtime assurance, tool-use safety, proof-carrying control, AI auditing, human-feedback training, autonomous agents, trusted computing bases, and verifiable AI system governance. It does not claim that historical human influence can be removed from model weights. Instead, it gives an operational framework for replacing live positive approval with declared, bounded, replayable, and challengeable mechanisms for specific protected action classes.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper