ML and agentic AI evaluation certify systems on outcome metrics — accuracy, calibration, task completion — and treat them as evidence of reliable computation. We argue this rests on a substitution: outcome agreement standing in for computational admissibility, the condition that every transition remained within the validity domain V defining the system's transition rule. Two trajectories, one admissible and one not, can produce identical metric values (Observation 1), so terminal-state metrics cannot determine admissibility — a near-definitional fact whose operational absence we trace to three ML-specific conditions: implicit V, silent failure, and an empirical evaluation culture. Agentic AI is where the gap is widest, because model and API updates silently shift V between evaluations. Our contribution is architectural. If admissibility cannot be recovered after execution, it must be enforced before: we restrict which transitions become registered system actions — the attributable units of behavior — rather than evaluating them afterward. A transition failing its pre-execution precondition acquires no action status at all, a stronger intervention than shielding's block-and-substitute. Where V is explicit a pre-action input gate suffices; where it is not, a multi-domain paradox-state gate substitutes a cross-domain compatibility rule set R for full V enumeration, illustrated by biometric multi-factor verification and agentic tool-use gating. A necessary-condition bound (Proposition 1) shows false release scaling as C(n,k)·pᵏ under domain independence. Because gate conditions live outside the model, the framework tracks V across the model evolution that defeats outcome metrics. The paper makes one universal negative claim (Observation 1), one conditional constructive claim with explicit V (§5.3), and one with R-proxy (§5.5), the last quantified by Proposition 1. Appendix A numerically verifies the bound, quantifies its degradation under shared-substrate dependence (§A.5), and confirms its robustness under real LFW face match scores (§A.6); no full deployed-system evaluation is reported.
Building similarity graph...
Analyzing shared references across papers
Loading...
JULGI KANG
Association for Symbolic Logic
Association for Symbolic Logic
Building similarity graph...
Analyzing shared references across papers
Loading...
JULGI KANG (Sat,) studied this question.
synapsesocial.com/papers/6a1d22db02fbce9130638947 — DOI: https://doi.org/10.5281/zenodo.20464667
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: