March 8, 2026

Push and Pull: Defending Against Retrieval Poisoning Attacks via Embedding Space Reshaping

LHLongzhu HeBeijing University of Posts and Telecommunications XZXi ZhangGeneral Cardiology QLQuan LiuBeijing University of Posts and Telecommunications

Abstract

Retrieval-Augmented Generation (RAG) improves the performance of Large Language Models (LLMs) by retrieving and integrating relevant information from external knowledge bases, which helps generate more accurate responses. However, RAG is vulnerable to retrieval poisoning attacks , where attackers can induce LLM to produce inaccurate responses by injecting malicious documents into the retrieval process. In this paper, we propose ShieldRAG , a novel defense framework designed to counteract retrieval poisoning attacks by reshaping the retrieval embedding space. ShieldRAG leverages a dual-strategy effect realized via a majority-consensus mechanism: ① Push: Implicitly forces the embedding of a user query away from malicious documents by filtering out their minority signals, reducing their influence. ② Pull : Aligns the embedding of a user query closer to that of benign documents, reinforcing accurate retrieval. These strategies work synergistically to preserve retrieval integrity and enhance the quality of LLM-generated responses. Specifically, ShieldRAG operates through three key steps: Sliding Retrieval Explanation Generation , Keyword Aggregation , and Query Targeting Optimization . These three steps collectively ensure the effective integration of information from benign sources while filtering out malicious interference, thereby significantly enhancing the robustness of RAG systems against retrieval poisoning attacks. We evaluate ShieldRAG on four open-domain Question Answering (QA) datasets: Natural Questions, MS-MARCO, HotpotQA, and 2WikiMultiHopQA, using seven representative LLMs. Extensive experiments demonstrate that ShieldRAG significantly improves response accuracy while mitigating adversarial effects, showcasing strong generalization across multiple datasets and LLM architectures.

KI fragen

Bookmark

View Full Paper