What question did this study set out to answer?

This research examines the inadequacies of current outcome metrics in evaluating agentic AI systems and proposes a pre-execution action restriction method.

June 1, 2026Open Access

Pre-Execution Restriction for Agentic AI: Registering Actions by Admissibility, Not Outcome Metrics

Read Full Paperexternally

Key Points

This research examines the inadequacies of current outcome metrics in evaluating agentic AI systems and proposes a pre-execution action restriction method.
Evaluated established ML conditions like implicit validity domains and silent failures.
Developed a framework to enforce admissibility before actions are registered rather than after execution.
Provided numerical analysis for verifying conditions under real-world scenarios.
Confirmed that outcomes can be misleading regarding system admissibility, with repeated observations of false metric alignment.
Demonstrated that pre-action restrictions can significantly enhance system reliability.
Quantified implications of false release under specific conditions, supporting the need for stricter pre-conditions in agentic AI.

Abstract

ML and agentic AI evaluation certify systems on outcome metrics — accuracy, calibration, task completion — and treat them as evidence of reliable computation. We argue this rests on a substitution: outcome agreement standing in for computational admissibility, the condition that every transition remained within the validity domain V defining the system's transition rule. Two trajectories, one admissible and one not, can produce identical metric values (Observation 1), so terminal-state metrics cannot determine admissibility — a near-definitional fact whose operational absence we trace to three ML-specific conditions: implicit V, silent failure, and an empirical evaluation culture. Agentic AI is where the gap is widest, because model and API updates silently shift V between evaluations. Our contribution is architectural. If admissibility cannot be recovered after execution, it must be enforced before: we restrict which transitions become registered system actions — the attributable units of behavior — rather than evaluating them afterward. A transition failing its pre-execution precondition acquires no action status at all, a stronger intervention than shielding's block-and-substitute. Where V is explicit a pre-action input gate suffices; where it is not, a multi-domain paradox-state gate substitutes a cross-domain compatibility rule set R for full V enumeration, illustrated by biometric multi-factor verification and agentic tool-use gating. A necessary-condition bound (Proposition 1) shows false release scaling as C(n,k)·pᵏ under domain independence. Because gate conditions live outside the model, the framework tracks V across the model evolution that defeats outcome metrics. The paper makes one universal negative claim (Observation 1), one conditional constructive claim with explicit V (§5.3), and one with R-proxy (§5.5), the last quantified by Proposition 1. Appendix A numerically verifies the bound, quantifies its degradation under shared-substrate dependence (§A.5), and confirms its robustness under real LFW face match scores (§A.6); no full deployed-system evaluation is reported.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

JULGI KANG

Association for Symbolic Logic

Actions

Institutions

Association for Symbolic Logic

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Pre-Execution Restriction for Agentic AI: Registering Actions by Admissibility, Not Outcome Metrics

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider