Recent disclosures of industrial-scale knowledge distillation — including campaigns comprising millions of fraudulent API exchanges targeting frontier models Anthropic, 2026 — have made post-hoc detection of model theft a critical security requirement. Building on a formally-verified framework of log-prob order-statistic geometry, we investigate the adversarial resilience of neural network identity across 72 experimental checkpoints. We establish a Two-Layer Identity Hypothesis: a model’s structural identity (weights-regime geometry) is empirically invariant to distillation (within acceptance threshold epsilon across all 18 protocols), while its functional identity (API-regime Poisson Point Process residuals) predictably transfers to the student, converging up to 52% toward the teacher’s template. Stress-testing this forensic channel against a white-box adversary, we find that functional provenance is geometrically coupled to the knowledge transfer objective. Adversarial erasure gradients are consistently dominated by the distillation loss, achieving only a transient suppression that rebounds within one epoch. Passive fine-tuning on fresh data erases the trace more effectively than any adversarial method, but at a measurable cost to general capability — revealing a Pareto frontier with no favorable region for the adversary. This establishes API forensics as a time-sensitive detective control (“The Tripwire”) and weights-regime identity as the immutable anchor (“The Vault”). Finally, we observe an apparent vulnerability: a cross-family adversarial spoofing attack achieves 69. 4% convergence toward a decoy’s fingerprint, while same-family spoofing catastrophically fails. We resolve this paradox by mapping the PPP-residual vector space, revealing that models cluster by capability topology, not corporate lineage. Cross-family “spoofing” is a spatial illusion caused by a narrow 7. 8 degree alignment between the decoy and the primary distillation trajectory (R2 = 0. 995), whereas same-family decoys are anti-aligned. Across all adversarial interventions, the underlying Gumbel universality (deltaₙorm) remains invariant (CV = 1. 9%). We conclude that during active distillation, an adversary cannot simultaneously acquire a teacher’s capabilities and erase or redirect the forensic trace. In this setting, the geometry forbids it.
Anthony Coslett (Sat,) studied this question.