What question did this study set out to answer?

The central aim is to enhance the effectiveness of jailbreak attacks on large language models by optimizing for multiple objectives.

June 17, 2026Open Access

Scaling the Strategy Wall: Efficient Jailbreaking of LLMs via Component-Based Multi-Objective Optimization

Key Points

The central aim is to enhance the effectiveness of jailbreak attacks on large language models by optimizing for multiple objectives.
Introduced a Componentized Multi-Objective Optimization Framework (CMOOF) to develop attack strategies.
Leveraged the NSGA-II algorithm to co-optimize attack success rate and query efficiency.
Conducted experiments on benchmark datasets to evaluate performance.
Achieved a jailbreak success rate of 98.75% on models like Llama3.
Demonstrated improvements in query efficiency over existing baselines.
Identified a Pareto-optimal trade-off between attack success and query efficiency.

Abstract

Background: Jailbreak attacks, which use crafted prompts to bypass safety alignments of Large Language Models (LLMs) and generate harmful content, pose a significant security threat. Existing methods often optimize for a single objective (e.g., attack success rate), neglecting critical factors like query efficiency, which limits their practicality and generalization. Methods: We propose a Componentized Multi-Objective Optimization Framework (CMOOF), which introduces a paradigm shift: it searches for generalizable and query-efficient attack strategy templates within a structured, component-based strategy space. CMOOF leverages the NSGA-II algorithm to explicitly co-optimize two first-class objectives: Attack Success Rate (ASR) and Query Efficiency, thereby discovering their Pareto-optimal trade-off frontier. Results: Experiments on benchmark datasets show significant improvements, with the highest jailbreak success rate reaching 98.75% on models like Llama3, and query efficiency surpassing baselines. Conclusions: CMOOF redefines jailbreak optimization from instance-level prompt crafting to strategy-level template discovery. The work provides an efficient, scalable, and generalizable jailbreak solution, and the framework offers broader insights for automated red teaming and LLM security defense.

KI fragen

Bookmark

View Full Paper