Tool-using AI agents couple a language model with controller logic, memory, and external tools such as browsers, email, calendars, file systems, and transaction APIs. This architecture expands capability, but it also enlarges the security boundary: agents routinely ingest untrusted content while holding privileges that can reveal private data and trigger external side effects. The resulting failures are not limited to poor text generation; they include prompt injection, indirect injection through tool outputs, confused-deputy behavior, unauthorized actions, and misleading claims about the tool state. Because large-scale testing on deployed products is difficult, vendor-specific, and ethically sensitive, we present a transparent, theoretical simulation-based framework for evaluating user-facing risk in tool-using agents. The methodological contribution is a formal threat model that separates compromise, harm, and severity, and a Monte Carlo evaluation pipeline that maps architectural choices (permissions, retrieval, memory exposure, and approvals) and defensive controls to comparable outcome metrics. We instantiate the framework for six representative threat scenarios and nine defense configurations, reporting attack success rate (ASR), benign task success, latency overhead, and severity-weighted harm. Across scenarios, the least-privilege tool design is the strongest single broad control, human-in-the-loop approvals sharply reduce high-impact actions and exports but degrade under user error and habituation, retrieval allowlisting nearly eliminates indirect injection while leaving other channels largely unaffected, and rate limiting reduces tail severity more than ASR. These results position agent safety as an architectural and operational problem and because they arise from an assumption-explicit simulator rather than field measurements, should be read as comparative design guidance rather than incident-rate estimates for any deployed product.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hasan Kanaker
Petra University
Hussam Fakhouri
Al-Balqa Applied University
Nader Abdel Karim
Al-Balqa Applied University
Computation
University of Jordan
Al-Balqa Applied University
Princess Nourah bint Abdulrahman University
Building similarity graph...
Analyzing shared references across papers
Loading...
Kanaker et al. (Sat,) studied this question.
synapsesocial.com/papers/69fa8eac04f884e66b5310be — DOI: https://doi.org/10.3390/computation14050098