Agent skills — structured packages of instructions, scripts, and references that augment a largelanguage model (LLM) without modifying the model itself — have moved from convenience tofirst-class deployment artifact. The runtime that loads them inherits the same problem packagemanagers and operating systems have always faced: a piece of content claims a behavior; theruntime must decide whether to believe it. We argue this paper’s central thesis up front: a skill isuntrusted code until it is verified, and the runtime that loads it must enforce that default ratherthan infer trust from a signature, a clearance, or a registry of origin. Without skill verification,a human-in-the-loop (HITL) gate must fire on every irreversible call — which is operationallyuntenable and degrades into rubber-stamping at any non-trivial scale. With skill verificationtreated as a separate, gated process, HITL fires only for what is unverified, and the systembecomes sustainable. We give a trust schema (§3) that includes an explicit verification level onevery skill manifest; a capability gate (§4) whose HITL policy is a function of that verificationlevel; a biconditional correctness criterion (§5) that any candidate verification procedure mustsatisfy on an adversarial-ensemble exercise (§6); and a portable runtime profile (§7) with tennormative guidelines abstracted from a working open-source reference implementation 13. Thecontribution is harness- and model-agnostic; nothing here requires retraining, fine-tuning, orproprietary infrastructure.
Alfredo Metere (Thu,) studied this question.