Artificial intelligence (AI) has become an essential tool in modern cybersecurity, enabling faster and more accurate detection, prevention, and response to threats. Within this landscape, large language models (LLMs) have emerged as versatile systems capable of generating code, providing technical guidance, and automating complex tasks. However, LLMs also introduce new security challenges, as they can be manipulated through prompt engineering and jailbreaking to perform malicious actions, potentially lowering the barrier for cyberattacks. This article investigates the risks and opportunities of LLMs using penetration testing, both as tools for ethical hacking and as potential targets themselves. We present an automatic framework that mutates prompts to test for jailbreak vulnerabilities across multiple LLM models, including GPT‐3.5 turbo, GPT‐4.1, and GPT‐5.0. Our experiments demonstrate how mutated prompts can generate concrete attack scenarios and reveal differences in how various models respond to malicious inputs. By analyzing the effectiveness and limitations of these techniques, this work contributes to a deeper understanding of LLM security, providing insights for both offensive and defensive applications in AI‐driven cybersecurity.
López-Delgado et al. (Thu,) studied this question.