ML and agentic AI evaluation certify systems on outcome metrics — accuracy, calibration, task completion — and treat them as evidence of reliable computation. We argue this rests on a substitution: outcome agreement standing in for computational admissibility, the condition that every transition remained within the validity domain V defining the system's transition rule. Two trajectories, one admissible and one not, can produce identical metric values (Observation 1), so terminal-state metrics cannot determine admissibility — a near-definitional fact whose operational absence we trace to three ML-specific conditions: implicit V, silent failure, and an empirical evaluation culture. Agentic AI is where the gap is widest, because model and API updates silently shift V between evaluations. Our contribution is architectural. If admissibility cannot be recovered after execution, it must be enforced before: we restrict which transitions become registered system actions — the attributable units of behavior — rather than evaluating them afterward. A transition failing its pre-execution precondition acquires no action status at all, a stronger intervention than shielding's block-and-substitute. Where V is explicit a pre-action input gate suffices; where it is not, a multi-domain paradox-state gate substitutes a cross-domain compatibility rule set R for full V enumeration, illustrated by biometric multi-factor verification and agentic tool-use gating. A necessary-condition bound (Proposition 1) shows false release scaling as C(n,k)·pᵏ under domain independence. Because gate conditions live outside the model, the framework tracks V across the model evolution that defeats outcome metrics. The paper makes one universal negative claim (Observation 1), one conditional constructive claim with explicit V (§5.3), and one with R-proxy (§5.5), the last quantified by Proposition 1. Appendix A numerically verifies the bound, quantifies its degradation under shared-substrate dependence (§A.5), and confirms its robustness under real LFW face match scores (§A.6); no full deployed-system evaluation is reported.
Building similarity graph...
Analyzing shared references across papers
Loading...
JULGI KANG (Sat,) studied this question.
synapsesocial.com/papers/6a1d22db02fbce9130638947 — DOI: https://doi.org/10.5281/zenodo.20464667
JULGI KANG
Association for Symbolic Logic
Association for Symbolic Logic
Building similarity graph...
Analyzing shared references across papers
Loading...