Background: Jailbreak attacks, which use crafted prompts to bypass safety alignments of Large Language Models (LLMs) and generate harmful content, pose a significant security threat. Existing methods often optimize for a single objective (e.g., attack success rate), neglecting critical factors like query efficiency, which limits their practicality and generalization. Methods: We propose a Componentized Multi-Objective Optimization Framework (CMOOF), which introduces a paradigm shift: it searches for generalizable and query-efficient attack strategy templates within a structured, component-based strategy space. CMOOF leverages the NSGA-II algorithm to explicitly co-optimize two first-class objectives: Attack Success Rate (ASR) and Query Efficiency, thereby discovering their Pareto-optimal trade-off frontier. Results: Experiments on benchmark datasets show significant improvements, with the highest jailbreak success rate reaching 98.75% on models like Llama3, and query efficiency surpassing baselines. Conclusions: CMOOF redefines jailbreak optimization from instance-level prompt crafting to strategy-level template discovery. The work provides an efficient, scalable, and generalizable jailbreak solution, and the framework offers broader insights for automated red teaming and LLM security defense.
Tao et al. (Thu,) studied this question.