What question did this study set out to answer?

The aim is to develop a framework that protects sensitive information during the retrieval process in Retrieval-Augmented Generation.

June 12, 2026Open Access

SD-RAG: A Framework for Secure Selective Disclosure in Retrieval-Augmented Generation against Single-turn Prompt-Leaking Attacks

Key Points

The aim is to develop a framework that protects sensitive information during the retrieval process in Retrieval-Augmented Generation.
Proposed the SD-RAG framework focusing on pre-redaction and privacy constraints during data retrieval.
Introduced a graph-based data model aiding fine-grained retrieval with dynamic security policies.
Conducted experiments in single-turn scenarios against prompt leaking attacks.
Achieved a 58% increase in the keyword-based privacy score compared to the baseline.
Demonstrated improved resilience against prompt injection strategies.

Abstract

Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained through efficient retrieval mechanisms over large-scale data collections. Currently, the majority of existing approaches overlook the risks associated with exposing sensitive or access-controlled information directly to the generation model. Only a few approaches propose techniques to instruct the generative model to refrain from disclosing sensitive information; however, recent studies have also demonstrated that such strategies remain vulnerable to prompt leaking attacks that can exfiltrate sensitive information via prompt injection. For these reasons, we propose a novel approach to Selective Disclosure in Retrieval-Augmented Generation, called SD-RAG, which decouples the enforcement of privacy constraints from the answer-generation process itself. SD-RAG relies on pre-redaction, applying sanitization and disclosure controls during the retrieval phase, prior to augmenting the question-answering LLM’s input with sensitive data. Moreover, we introduce a semantic mechanism to allow the ingestion of human-readable dynamic security and privacy constraints together with an optimized graph-based data model that supports fine-grained, policy-aware retrieval. In our experiments, we focus on the single-turn scenario, where an external attacker that relies on a malicious prompt template attempts to obtain sensitive information from the system by asking one question. Our experimental evaluation shows a promising improvement over the baseline in the single-turn prompt leaking scenario, achieving up to a 58% increase in the keyword-based privacy score metric that we introduce.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper