Retrieval-augmented generation improves the factual consistency, knowledge timeliness, and scenario adaptability of large model inference services by incorporating external knowledge. However, it also introduces structural privacy risks, including private-knowledge leakage, prompt injection, and progressive information extraction in multi-turn interactions. To address these issues, this paper proposes Private-RAG, a privacy-preserving retrieval-augmented generation method for large model inference. The method constructs a composite threat model and a quantitative evaluation framework for the RAG pipeline, and further develops a layered collaborative defense mechanism consisting of controlled retrieval, sensitivity-aware context minimization, structured prompt isolation, and multi-criterion output gating. In addition, a risk feedback-driven budget accounting method is introduced to enable dynamic risk control in multi-turn interaction scenarios. Experimental results show that Private-RAG effectively reduces private-knowledge leakage, improves robustness against prompt injection, and suppresses cumulative privacy exposure while maintaining question-answering utility and a controllable deployment latency (e.g., 1165 ms), demonstrating superior privacy protection and inference robustness.
Yang et al. (Wed,) studied this question.