As AI systems increasingly rely on private and proprietary data, conventional data anonymization and tokenization techniques have proven insufficient to prevent information leakage during retrieval-augmented generation (RAG) and autonomous AI agent execution. Surface-level identifier masking leaves the deeper semantic structure of domain knowledge—decision rules, scoring formulas, and operational procedures—fully exposed to reconstruction through iterative querying. This paper presents Tokenis, a cryptographic pseudonymization framework that combines Named Entity Recognition (NER)–based data selection with elliptic-curve ElGamal (EC-ElGamal) encryption to protect both personal data and high-value domain ideas. Unlike existing approaches that focus solely on personally identifiable information (PII), Tokenis introduces an idea protection protocol that prevents the reconstruction and replication of proprietary rules, strategies, and algorithms while still enabling AI-driven reasoning over the protected knowledge. We present a formal security model grounded in the Decisional Diffie-Hellman (DDH) assumption, prove three core security properties (PII confidentiality, idea reconstruction resistance, and adaptive query safety), and demonstrate end-to-end integration with the TorusDB RAG platform. Tokenis serves as a foundational privacy layer for encrypted RAG systems and autonomous AI agencies, enabling controlled utilisation of sensitive domain knowledge without exposingit to the model.
Shim et al. (Fri,) studied this question.