This working paper identifies a failure mode in frontier AI safety infrastructure that is distinct from model collapse, benchmark contamination, or distribution drift. It describes a functional shift of evaluation categories — boundaries persist while what they separate becomes progressively opaque. The argument proceeds in three movements. First, three failure conditions are specified — self-reference, anchor drift, and proxy displacement — under which any category loses transparency. These conditions are demonstrated on the 2008 collapse of the Gaussian copula in credit markets, drawing on the co-production analysis of MacKenzie & Spears (2014). Second, the same conditions are shown to be assembling in ML training and evaluation infrastructure along three reinforcing paths: recursive synthetic injection (reifying self-reference), shared evaluation lineage (reifying anchor drift), and proxy signal amplification (reifying proxy displacement). Each path is a direct instantiation of one failure condition, and together they produce all three jointly across the categories "training data," "evaluation benchmark," and "human feedback." Third, three externality conditions are derived as the structural inverses of the failure conditions: provenance independence, external anchoring sustained by incentive misalignment, and speed parity. Four current safety approaches — UK/US AI Safety Institutes, interpretability research, open-weight release, and cryptographic attestation (zkML) — are mapped against these conditions, each reading as a specific trade-off rather than an uncategorized failure. No current approach internal to the ML substrate satisfies all three conditions simultaneously, and no current non-ML automation track (formal verification, static analysis of learned weights, symbolic AI) closes the gap left by substrate sharing on the speed dimension. The closing sections examine why the dominant "safety as a technical problem" framing has structural difficulty recognizing this failure mode, with the source of the limited visibility traced to the signal-to-noise structure of self-referential evaluation. Three partial external channels — physical-feedback loops, hardware attestation, and cryptographically certified human authorship — are sketched as starting points for the substrate-external audit infrastructure the note argues is required.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ghjuvan Ortulanu
NatureServe
NatureServe
Building similarity graph...
Analyzing shared references across papers
Loading...
Ghjuvan Ortulanu (Fri,) studied this question.
synapsesocial.com/papers/69e472fc010ef96374d8ed6d — DOI: https://doi.org/10.5281/zenodo.19627562