This paper examines prompt injection vulnerabilities in Large Language Models (LLMs) and AI agents, one of the most critical security challenges facing modern AI systems. The work presents a structured taxonomy of prompt injection attacks, including direct instructions, role override, hidden text attacks, multi turn manipulation, and tool misuse attempts. A comprehensive evaluation methodology is proposed to assess the resilience of contemporary AI models against adversarial prompts using metrics such as Attack Success Rate (ASR), severity, recovery ability, consistency, false positive rate, and task performance retention. The paper also reviews current benchmark efforts and defense strategies, including multi layered security frameworks, structured prompting, input validation, response verification, and human in the loop controls. The expected outcome is a reproducible evaluation framework, a prompt injection benchmark dataset, and actionable recommendations for improving AI agent security. This work contributes to ongoing research in AI safety, adversarial machine learning, and secure AI agent deployment.
Ruhulalemeen Mulla (Wed,) studied this question.