Agentic verification systems increasingly rely on reviewer models, tool calls, gates, and repeated audit loops. These systems can check many properties of a declared task representation, but the representation itself must first make the relevant properties available for audit. This paper argues that agentic verification requires a distinct role of specification authority: the function that determines what the verifier must be able to inspect before verification begins. The argument extends the specification-boundary framework introduced in "Why Sense Matters" and the specification-boundary engineering results of "How Sense Works. " Through controlled reviewer tests derived from citation-verification failures, the paper shows that audit reliability depends on more than adding another reviewer. In a fix-verify architecture test, centralized self-review returned PASS while separated verification with a claim-evidence map returned HOLD on an unresolved historical-reference disclosure problem. In a 12-output 2 x 2 replication, PASS/HOLD verdicts varied across repetitions, but claim-evidence maps more reliably localized the historical-reference-file manifest and answer-key/seeding fields as the relevant evidence object. The paper then reports a 3 x 3 diagnostic reviewer matrix crossing three reviewer rows with three blinded boundary-defect stimuli. All nine cells produced HOLD; the same verdict label carried different diagnostic content across reviewers. Visible direct answer-key exposure and visible proxy-field leakage were precisely identified by some reviewer rows, while the omitted historical field manifest received only broad missing-evidence treatment. A targeted rule-strength expansion then isolates a consequence-layer effect: in a fresh paired OpenAI API rerun using the dated snapshot gpt-5. 5-2026-04-23, a weak asymmetric rule produced PASS with the clean-reference manifest marked PARTIAL, while a strong symmetric rule assigning release-blocking force to the same gap produced HOLD with blocker CLEANMANIFEST. A schema-controlled homogeneous replication strengthened and narrowed this result: GPT-5. 5 reproduced the clean flip across three paired reruns, while Grok 4. 3 and DeepSeek routes showed model-contingent over-enforcement and field-level inconsistencies. The contribution is a role-level account of agentic verification: reliable audit requires separated definition, execution, verification, and release functions, bound by explicit evidence maps, output schemas, and consequence rules that name which prohibited fields, proxy fields, missing manifests, and evidence objects block interpretation.
Jianeng Zhou (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: