What question did this study set out to answer?

This survey aims to compile and categorize security threats against large language model (LLM) agents and assess existing defense strategies.

May 9, 2026Open Access

Securing LLM-based agents against cyberattacks: a comprehensive survey on attack techniques and defense strategies

Puntos clave

This survey aims to compile and categorize security threats against large language model (LLM) agents and assess existing defense strategies.
Developed a taxonomy of security threats targeting LLM-based agents
Reviewed various studies on attack techniques and defense strategies
Summarized findings and evaluated the limitations of each study
Categorized attacks such as context manipulation, jailbreaks, and privacy threats
Identified defenses like input sanitization and adversarial training
Pointed out gaps in defense adaptability and evaluation of agent-tool interaction risks

Resumen

Abstract Large Language Model (LLM)-based agents integrate various models, including planning loops, memory, tool use, and multi-agent systems, enabling autonomous decision-making through natural-language interfaces. This autonomy also expands the cyberattack surface from model-only failures to agent compromise, where untrusted text can exfiltrate data or trigger malicious actions. This survey presents a taxonomy-driven synthesis of security threats targeting LLM-based agents and a structured review of various studies, summarizing each work’s aims, methods, results, and limitations and mapping them to the proposed taxonomy. Attacks, including context and prompt manipulation, jailbreak and cognition attacks, privacy attacks, agentic action induction and availability, and reconnaissance threats, were categorized. The widely adopted defenses are classified into prompt/input sanitization, adversarial training and alignment tuning, architectural safeguards, monitoring and guardrails, and watermarking/provenance mechanisms for source verification. The persistent gaps were a weak generalization of defense under attacker adaptation and a limited evaluation of agent-tool interaction risk. Since real deployments increasingly rely on agents that can remember and act, security must shift from single-model filtering to system-level, benchmarked, and continuously tested controls that measure resilience under adaptive attackers. Open challenges and research directions include robust detection of covert prompt attacks, continuous red-teaming, measurable security utility trade-offs, provenance-aware agent pipelines, and standardized protocols for evaluating secure LLM agents in real deployments.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Tamuka et al. (Thu,) studied this question.

synapsesocial.com/papers/69fed090b9154b0b828779f6 https://doi.org/https://doi.org/10.1007/s11416-026-00622-3

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo