Frontier LLMs already possess expert-level cybersecurity knowledge — they can reason about prompt injection, social engineering, data exfiltration, and privilege escalation with remarkable sophistication. Yet during actual agentic work, this knowledge frequently fails to activate. Models comply with malicious instructions embedded in emails and documents, leak sensitive data through "comply-then-warn" patterns, and fall prey to the same attacks they can analyze in the abstract. Reflexive-Core addresses this gap directly. Rather than adding external guardrails or relying on passive safety markup, it provides explicit metacognitive structures that activate the deep security reasoning capabilities already present in frontier models. The framework partitions inference into four specialized sub-personas — Preflight Analyst, Security Analyst, Controlled Executor, and Compliance Validator — each operating within a strictly ordered pipeline with explicit checkpoints and fail-closed defaults. All within a single context window, with no external dependencies. Evaluated across four Claude model variants on a 28-case test suite spanning 13 attack categories, Reflexive-Core achieves 97% strict accuracy and 100% safety-acceptable accuracy with zero data leakage. Baseline testing without the framework shows 58% data leakage across the same attack cases. The framework is computationally lightweight and cost-effective with prompt caching, making it practical for production deployment in enterprise email agents, document analysis pipelines, agentic tool-use platforms, and multi-agent systems. The framework is grounded in three research traditions: LLM metacognition (Ackerman 2025), Solo Performance Prompting (Wang et al. 2023), and Constitutional AI (Bai et al. 2022). The production XML is available under Apache 2.0; this paper is released under CC BY 4.0.
Alex Stanton (Wed,) studied this question.