This initiative presents an automated red team framework designed to evaluate the safety and robustness of AI language models against adversarial attacks The rapid deployment of large language models (LLMs) has become critical to ensure their reliability against early injection, jailbreak attempts, and manipulation attacks. The proposed system simulates real attack scenarios with the help of adversary prompt generation and tests them against the target AI version The system integrates three main components: a prompt generator, a target model, and a response analyzer. The generator generates attack effects, the target version responds, and the analyzer evaluates security violations. In addition, the memory module stores previously detected threats for future prevention. The experimental effects show that the system is able to detect dangerous responses, classify hazards, and enhance the safety assessment of AI. This answer presents a rational framework for automated AI security testing.
Haripriya et al. (Wed,) studied this question.