Key points are not available for this paper at this time.
Compared with the traditional usage of large language models (LLMs) where users directly send queries to an LLM, LLM-integrated applications serve as middleware to refine users' queries with domain-specific knowledge to better inform LLMs and enhance the responses. However, LLM-integrated applications also introduce new attack surfaces. This work considers a setup where the user and LLM interact via an application in the middle. We focus on the interactions that begin with user's queries and end with LLM-integrated application returning responses to the queries, powered by LLMs at the service backend. We identify potential high-risk vulnerabilities in this setting that can originate from the malicious application developer or from an outsider threat initiator that can control the database access, manipulate and poison high-risk data for the user. Successful exploits of the identified vulnerabilities result in the users receiving responses tailored to the intent of a threat initiator. We assess such threats against LLM-integrated applications empowered by GPT-3.5 and GPT-4. Our experiments show that the threats can effectively bypass the restrictions and moderation policies of OpenAI, resulting in users exposing to the risk of bias, toxic content, privacy, and disinformation. We develop a lightweight, threat-agnostic defense to mitigate insider and outsider threats. Our evaluations demonstrate the efficacy of our defense.
Jiang et al. (Fri,) studied this question.