What question did this study set out to answer?

This research addresses privacy risks associated with retrieval-augmented generation in large model inference.

June 12, 2026Open Access

Private-RAG: A Privacy-Preserving Retrieval-Augmented Generation Method for Large Model Inference

Key Points

This research addresses privacy risks associated with retrieval-augmented generation in large model inference.
Developed Private-RAG method with a composite threat model and quantitative evaluation framework.
Implemented a layered collaborative defense mechanism with controlled retrieval and context minimization.
Introduced dynamic risk control using a risk feedback-driven budget accounting method.
Private-RAG reduced private-knowledge leakage significantly, improving robustness against prompt injection.
Cumulative privacy exposure was suppressed while maintaining question-answering utility.
Controlled deployment latency was achieved at approximately 1165 ms.

Abstract

Retrieval-augmented generation improves the factual consistency, knowledge timeliness, and scenario adaptability of large model inference services by incorporating external knowledge. However, it also introduces structural privacy risks, including private-knowledge leakage, prompt injection, and progressive information extraction in multi-turn interactions. To address these issues, this paper proposes Private-RAG, a privacy-preserving retrieval-augmented generation method for large model inference. The method constructs a composite threat model and a quantitative evaluation framework for the RAG pipeline, and further develops a layered collaborative defense mechanism consisting of controlled retrieval, sensitivity-aware context minimization, structured prompt isolation, and multi-criterion output gating. In addition, a risk feedback-driven budget accounting method is introduced to enable dynamic risk control in multi-turn interaction scenarios. Experimental results show that Private-RAG effectively reduces private-knowledge leakage, improves robustness against prompt injection, and suppresses cumulative privacy exposure while maintaining question-answering utility and a controllable deployment latency (e.g., 1165 ms), demonstrating superior privacy protection and inference robustness.

Private-RAG: A Privacy-Preserving Retrieval-Augmented Generation Method for Large Model Inference

Key Points

Abstract

Cite This Study