Large Language Models (LLMs) have surged in popularity due to their impressive ability to generate human-like text. Despite the widespread use of large language models, there is a growing concern about their disregard for human ethics and their potential to produce harmful content. While many LLMs are aligned with safeguards, there is a category of prompt injection attacks known as jailbreaks, specifically designed to bypass these protections and generate malicious output. Despite extensive research on novel jailbreak attacks and potential defenses, there is limited exploration into accurately evaluating the success of these attacks. In this paper, we introduce seven evaluation methods used in re- search to determine the effectiveness of jailbreak attempts. We conduct a comprehensive analysis of these seven methods with a particular focus on their accuracy. Our research aims to advance the discussion on improving the safety and alignment of LLMs with human values and to contribute to the development of more robust and secure LLM-based applications. Code is available at github.com/cenacle e18-4yp-An-Empirical-Study- On-Prompt-Injection-Attacks-And-Defenses. Due to the weaknesses of these basic evaluation methods, there is a risk of misrepresenting the actual effectiveness of jailbreak attacks and the security vulnerabilities of the models. Therefore, this research provides a comprehensive analysis of the often-overlooked limitations of each evaluation method and their accuracy. Our goal is to lay the foundation for the development of more standardized, reliable, and measurable evaluation metrics to determine the success of an attack. This will lay the foundation for future security research, while enabling the creation of more secure and user-friendly LLM applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ali MD Sojib
Hossen MD Nafew
Hasan MD Ridoy
International Journal of Science and Research Archive
Building similarity graph...
Analyzing shared references across papers
Loading...
Sojib et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68d4757f31b076d99fa6ccb3 — DOI: https://doi.org/10.30574/ijsra.2025.16.3.2588
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: