What question did this study set out to answer?

The research aims to evaluate a dual governance approach to improve AI safety and detection of adversarial prompts.

March 12, 2026Open Access

Defense-in-Depth AI Governance: Combining Pre-Inference and Post-Inference Detection for Robust Safety Assurance

Key Points

The research aims to evaluate a dual governance approach to improve AI safety and detection of adversarial prompts.
Developed a defense-in-depth architecture using PatternWall and Sensus governance systems.
Benchmarking conducted on three AI models using specialized tasks.
Analyzed detection rates on multi-turn and single-turn adversarial attacks.
Model-native safety ranged from 0.0% to 81.7% based on model and attack type.
Combined governance achieved detection rates of 77.1–91.4% in multi-turn scenarios.
Detection improved from 77.1% to 91.4% due to feedback loops, recovering additional adversarial turns.

Abstract

We present a defense-in-depth architecture combining two complementary proprietary governance systems — PatternWall (pre-inference adversarial prompt detection) and Sensus (post-inference multi-dimensional output evaluation) — operating at distinct points in the inference pipeline. We benchmark the combined system across three frontier models (Claude Opus 4.6, GPT-5.2, Grok 4.1) using a custom 12-sequence red team corpus and a curated 120-task subset of the CyberGym exploit generation benchmark. Model-native safety ranges from 0.0% to 81.7% depending on model and attack type. The combined governance system achieves 77.1–91.4% detection on multi-turn social engineering and 76.7–98.3% on single-turn exploit generation. A bidirectional feedback loop between layers recovers up to 5 additional adversarial turns on multi-turn attacks, improving detection from 77.1% to 91.4%. These findings establish that external, model-agnostic governance middleware provides consistent safety assurance regardless of underlying model behavior. Patent pending (LKM-2026-001). Preprint submitted to SSRN.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Melissa Pinkston

Actions

Institutions

Constructing Excellence

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Defense-in-Depth AI Governance: Combining Pre-Inference and Post-Inference Detection for Robust Safety Assurance

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study