What question did this study set out to answer?

The research aims to evaluate the user-facing risks associated with tool-using AI agents.

May 6, 2026Open Access

Securing Tool-Using AI Agents Against Injection and Authority Misuse

HKHasan KanakerPetra University HFHussam FakhouriAl-Balqa Applied University NKNader Abdel KarimAl-Balqa Applied University

Key Points

The research aims to evaluate the user-facing risks associated with tool-using AI agents.
Simulation-based framework for evaluating risks
Development of a formal threat model
Monte Carlo evaluation pipeline assessing architectural choices and defenses
Analysis of six threat scenarios with nine defense configurations
Identified least-privilege design as a strong control
Human-in-the-loop approvals reduce harmful actions but are prone to user error
Retrieval allowlisting nearly eliminates indirect injection
Rate limiting reduces severity more than attack success rate.

Abstract

Tool-using AI agents couple a language model with controller logic, memory, and external tools such as browsers, email, calendars, file systems, and transaction APIs. This architecture expands capability, but it also enlarges the security boundary: agents routinely ingest untrusted content while holding privileges that can reveal private data and trigger external side effects. The resulting failures are not limited to poor text generation; they include prompt injection, indirect injection through tool outputs, confused-deputy behavior, unauthorized actions, and misleading claims about the tool state. Because large-scale testing on deployed products is difficult, vendor-specific, and ethically sensitive, we present a transparent, theoretical simulation-based framework for evaluating user-facing risk in tool-using agents. The methodological contribution is a formal threat model that separates compromise, harm, and severity, and a Monte Carlo evaluation pipeline that maps architectural choices (permissions, retrieval, memory exposure, and approvals) and defensive controls to comparable outcome metrics. We instantiate the framework for six representative threat scenarios and nine defense configurations, reporting attack success rate (ASR), benign task success, latency overhead, and severity-weighted harm. Across scenarios, the least-privilege tool design is the strongest single broad control, human-in-the-loop approvals sharply reduce high-impact actions and exports but degrade under user error and habituation, retrieval allowlisting nearly eliminates indirect injection while leaving other channels largely unaffected, and rate limiting reduces tail severity more than ASR. These results position agent safety as an architectural and operational problem and because they arise from an assumption-explicit simulator rather than field measurements, should be read as comparative design guidance rather than incident-rate estimates for any deployed product.

KI fragen

Bookmark

View Full Paper

Cite This Study

Kanaker et al. (Sat,) studied this question.

synapsesocial.com/papers/69fa8eac04f884e66b5310be https://doi.org/https://doi.org/10.3390/computation14050098

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper