What question did this study set out to answer?

This research examines whether provenance detection generalizes across neural network model families and if API verification is effective in large-scale deployments.

March 8, 2026Open Access

Provenance Generalization and Verification Scaling for Neural Network Forensics

Key Points

This research examines whether provenance detection generalizes across neural network model families and if API verification is effective in large-scale deployments.
Conducted four coordinated studies on distilled model checkpoints across various architectures and protocols.
Analyzed provenance transfer through weight-geometry and API-logprob methods while training multiple teacher-student model pairs.
Developed and tested a new directional diagnostic for provenance detection in inner-product spaces.
Expanded API endpoint verification to include 14 models from multiple commercial providers.
Provenance transfer was found to generalize across multiple teacher-student pairs with high cosine alignment metrics.
API verification showed no breaches across 182 pairwise comparisons, establishing new thresholds for reliable verification.
A three-tier zero-knowledge attestation architecture was proposed to address trust issues in model forensics.

Abstract

Prior work established that knowledge distillation transfers a detectable provenance trace from teacher to student models, and that API endpoint verification can identify models through logprob order-statistic geometry. Both results were demonstrated on single teacher-student pairs and a six-model API zoo, leaving open whether provenance detection generalizes across model families and whether API verification scales to production-density endpoint populations. We address both questions through a coordinated experimental program spanning four studies. In the first study, we train 24 distilled checkpoints across 7 experimental arms — 3 teacher families (Qwen, Mistral, Llama), 4 student architectures (Qwen-0. 5B, Qwen-1. 5B, Llama-1B, Gemma-2B), and 2 training protocols (logit-level knowledge distillation and cross-tokenizer supervised fine-tuning) — measuring provenance transfer in both the weight-geometry and API-logprob regimes. Provenance transfer generalizes across the tested matrix: all 14 mature-epoch checkpoints show directional coupling to the teacher (cosine alignment cosθ > 0. 8, with 13 of 14 exceeding 0. 85). The strongest signal arises in a cross-family arm (Mistral-7B → Llama-1B, scalar convergence 0. 858) that is inconsistent with a purely family-restricted transfer hypothesis within the tested matrix. The normalized third logit gap δₙorm remains within 1. 4% coefficient of variation across all 31 checkpoints and 4 student architectures — the tightest confirmation of Gumbel-class universality in this experimental program. An extension to mixture-of-experts architecture (Mixtral-8x7B, δₙorm = 0. 309) confirms that the universal constant persists under sparse expert routing. In the second contribution, we identify a systematic failure mode of scalar provenance metrics and introduce the geometrically correct directional diagnostic for provenance detection in inner-product spaces. The standard scalar convergence metric ConvT conflates direction and magnitude into a single value, discarding the directional information that provenance detection requires. In two independent experiments, this produced misleading conclusions: a false spoofing signal (R² = 0. 995 of apparent cross-family convergence explained by pure knowledge distillation geometry, with the adversarial gradient contributing 4. 8%) and a false failure signal (negative ConvT despite consistent directional coupling at cosθ = 0. 91). The alignment diagnostic applies the law of cosines in PPP-residual template space (vectors in RK with Euclidean distance) to decompose student movement into direction and magnitude, preserving the provenance signal that scalar distance metrics destroy. We establish a measurability threshold: when the baseline-to-teacher distance d (B, T) falls below approximately 1. 0, scalar ConvT becomes unreliable and the directional diagnostic becomes the primary metric. This diagnostic applies to any distillation forensics framework that measures convergence in an inner-product space. In the third contribution, we extend API endpoint verification from 6 models to 14 across 3 commercial providers (OpenAI, Google Vertex AI, xAI), observing zero breaches across 182 pairwise impostor comparisons under per-model adaptive thresholds and three independent enrollment sessions, with a centroid reference protocol (CRP) that replaces the centroid L² metric, which produces false breaches at 14-model density. We establish a minimum truncation floor: API endpoints exposing fewer than 7 logprob ranks cannot support reliable verification (signal collapses within one rank of this boundary). Speculative decoding — an increasingly common inference optimization — is shown to be transparent to the verification protocol, with the speculative-decoded fingerprint deviating from the verifier-only fingerprint by 10. 6% of the inter-model distance. Finally, we formalize the Trust Paradox in model forensics — a victim cannot prove weight theft without disclosing weights, and a suspect cannot prove innocence without disclosing training data — and propose a three-tier zero-knowledge attestation architecture that addresses it. The first tier (committed distance proof) enables a model owner to prove fingerprint proximity to a public anchor without revealing the fingerprint vector, using standard cryptographic commitments with verifier-controlled thresholds. The second tier (hardware-attested measurement) removes the requirement that the prover be trusted to compute the fingerprint correctly, binding the measurement to a trusted execution environment attestation. The third tier (full zero-knowledge extraction) would eliminate all trust assumptions beyond cryptographic soundness; we present this as an open problem with pre-registered falsification criteria, including a fixed-point precision gate derived from the minimum pairwise separation in the existing 23-model zoo. The architecture defines eight properties that a meaningful zero-knowledge model identity proof must satisfy — extending the formal verification doctrine (311 + 41 = 352 theorems across 17 Coq proof files 1, 2, 0 Admitted) into the cryptographic regime — and six explicit trust assumptions under which the proof statements hold. This framework is validated through two of three tiers: the Tier 1 committed distance proof has been implemented and hardened, and the Tier 2 hardware-attested measurement has been validated on production confidential computing hardware (6 models, 1, 536 measurements, 0 failures inside an H100 trusted execution environment, with both CPU and GPU attestation tokens bound to a common cryptographic root and structural fingerprints transparent to confidential computing mode). Tier 3 is proposed, and its falsification criteria are stated. The experimental results in this paper are grounded in the formal verification stack and measurement infrastructure described in the companion papers 1, 2, 3. All provenance claims are classified as VALIDATED (empirical) ; the zero-knowledge architecture is classified as VALIDATED for Tier 1 (committed distance proof) and PROPOSED for Tiers 2 and 3. The authentication protocol and measurement methodology are the subject of U. S. Provisional Patent Applications 63/982, 893 and 63/990, 487; the zero-knowledge attestation architecture is the subject of U. S. Provisional Patent Application 63/996, 680.

Provenance Generalization and Verification Scaling for Neural Network Forensics

Key Points

Abstract

Cite This Study