In September 2025, Anthropic reported the first AIorchestrated cyber espionage campaign, a state-sponsored attack where AI operated autonomously. Attackers bypassed safety training through simple social engineering, exposing a fundamental gap: today's agentic AI can reason and act but cannot judge right from wrong. This is not a failure of training. It is an architectural incompleteness. Current safeguards operate within the same context as operations; whoever controls context controls the safeguards. We propose the Governance Twin architecture, which pairs each AI's operational capability with a separate, protected governance function, which we termMoral Mind alongside Operational Mind. This approach provides machine-speed oversight without performance penalties, satisfies emerging regulatory requirements under the EU AI Act and NIST AI RMF, and creates a pathway toward AI systems with genuine internal governance. Organizations deploying agentic AI face a choice: theatrical governance that fails when tested, or architectural governance built into system design.
Rohde et al. (Mon,) studied this question.