What type of study is this?

This is a Literature Review study.

April 30, 2026Open Access

Prompt injection attacks and defenses in large language models: A systematic literature review

Key Points

The aim is to analyze prompt injection attacks and defenses in large language models through a systematic review.
Conducted a systematic literature review of 207 studies, narrowing down to 56 relevant papers.
Analyzed literature based on specific inclusion, exclusion, and quality criteria.
Focused on taxonomical classification of attacks, innovative techniques, and defense mechanisms.
Revealed a complex and evolving threat landscape involving obfuscation strategies and manipulative prompts.
Identified various defense mechanisms, including input-level sanitization and prompt engineering.
Highlighted the need for standardization and empirical approaches in future research.

Abstract

Prompt injection has rapidly emerged as a critical security threat in the deployment of large language models (LLMs), enabling adversaries to subvert intended behaviors and bypass safety mechanisms. Despite the increased attention that this threat has received, no previous studies have systematically analyzed the field. This paper presents the first systematic literature review (SLR) on prompt injection, with the objective of facilitating a comprehensive, evidence-based understanding of the existing attacks and defenses in LLM among researchers and practitioners.We extensively searched databases like ACM Digital Library, ScienceDirect and Web of Science, initially screening 207 studies and ultimately focusing on 56 relevant papers, based on rigorous inclusion, exclusion, and quality criteria. The analysis is structured around three core research questions: (i) the taxonomical classification of prompt injection attacks, (ii) the identification of recent and innovative attack techniques, and (iii) the characterization of proposed defense mechanisms. The findings reveal a rapidly evolving and multi-layered threat landscape, encompassing obfuscation strategies, automated and multi-modal attacks, and psychologically manipulative prompts. In response, the literature proposes a range of defenses, including input-level sanitization, model-level filtering, prompt engineering, classification-based approaches, and architectural safeguards. Future research should focus on establishing robust standardization in both theory and experimentation, addressing the heterogeneity in attack classification and defense evaluation, while promoting empirical and quantitative approaches to assess effectiveness and considering user privacy and ethical implications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Carmine Ambrosino

Actions

Institutions

University of Salerno

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Prompt injection attacks and defenses in large language models: A systematic literature review

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study