What question did this study set out to answer?

The research aims to address the challenge of verifying AI training data compliance while protecting trade secrets.

March 25, 2026Open Access

Attribution Without Disclosure: Zero-Knowledge Proofs of Semantic Non-Membership for AI Training Data Compliance

Key Points

The research aims to address the challenge of verifying AI training data compliance while protecting trade secrets.
Proposed architecture for Zero-Knowledge Semantic Non-Membership (ZK-SNM) verification.
Identification of technical challenges in implementing semantic similarity within zero-knowledge proofs.
Use of locality-sensitive hashing for improving computational efficiency.
Consideration of hierarchical verification to optimize the approach.
Established a framework for proving semantic non-membership without revealing training data.
Identified gaps in existing literature regarding semantic fingerprinting and zero-knowledge proofs.

Abstract

Current approaches to verifying AI training data compliance face a fundamental tension: copyright holders need to know whether their content was used in training (EU AI Act, Article 53(1)(d)), while model providers need to protect their training data as trade secrets (GDPR, trade secret law). Existing zero-knowledge proof systems for machine learning (ZKML) address this partially by providing proofs of non-membership for exact data points. However, real-world training pipelines involve tokenization, chunking, paraphrasing, and augmentation, rendering exact-match proofs insufficient. We identify a gap in the literature: no existing system combines semantic fingerprinting with zero-knowledge proofs to enable semantic non-membership verification. We propose an architecture for Zero-Knowledge Semantic Non-Membership (ZK-SNM) that enables a model provider to prove, without revealing any training data, that no document in their training corpus is semantically similar to a queried document above a specified threshold. We discuss the technical challenges, including the computational cost of similarity search within ZK circuits, and propose mitigation strategies based on locality-sensitive hashing and hierarchical verification. This position paper establishes the problem formulation and proposed architecture; experimental validation is left to subsequent work.

Attribution Without Disclosure: Zero-Knowledge Proofs of Semantic Non-Membership for AI Training Data Compliance

Key Points

Abstract

Cite This Study

Also Consider

Also Consider