What does this research mean for the field?

Prompt injection in agentic large language model systems should be reframed as an architectural and operational safety problem rather than just a linguistic weakness. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The research aims to address prompt injection vulnerabilities in agentic large language models (LLMs) by proposing a structural risk gating framework.

March 1, 2026Open Access

View Full Paper

Beyond Prompt Filters: Structural Risk Gating for Agentic LLM Safety

AMAya Mizutani

Key Points

The research aims to address prompt injection vulnerabilities in agentic large language models (LLMs) by proposing a structural risk gating framework.
Introduced a structural risk gating framework with design principles.
Identified key architectural and operational safety issues related to prompt injection.
Outlined a threat model and expected trade-offs for the implementation.
Proposed architectural preconditions to prevent prompt injection.
Recommended specific design principles for safer LLM deployment.
Emphasized the importance of trust-boundary management and privilege minimization.

Abstract

This preprint proposes a structural approach to mitigating prompt injection in agentic large language model (LLM) systems. While most existing defenses focus on prompt-level filtering, linguistic sanitization, or model alignment techniques, this work argues that prompt injection should be reframed as an architectural and operational safety problem. In tool-using and agentic LLM environments, attacks frequently exploit trust-boundary confusion, privilege escalation pathways, and irreversible execution channels rather than purely linguistic weaknesses. The paper introduces a structural risk gating framework built on the following design principles: • Separation of execution and auditing roles • Explicit modeling of trust boundaries • Privilege minimization with gated escalation • Abstract risk labeling beyond attack templates • Modality-aware auditing • Governance-aware logging and reviewability Instead of enumerating specific attack prompts, this framework targets the architectural preconditions that enable prompt injection to succeed. The paper outlines a threat model, expected trade-offs, and directions for empirical validation. It is intended as a conceptual contribution toward safer deployment architectures for agentic LLM systems. Keywords: prompt injection, LLM security, agentic AI, AI governance, structural safety

AI에게 질문

Bookmark

View Full Paper

AI에게 질문

Bookmark

View Full Paper

Beyond Prompt Filters: Structural Risk Gating for Agentic LLM Safety

Key Points

Abstract

Cite This Study