What question did this study set out to answer?

This research aims to improve the cybersecurity response of agentic LLMs through a metacognitive framework.

February 26, 2026Open Access

Reflexive-Core: Single-Context Metacognitive Security for Agentic LLMs

Key Points

This research aims to improve the cybersecurity response of agentic LLMs through a metacognitive framework.
Developed a metacognitive structure with four sub-personas for cybersecurity tasks.
Evaluated the framework against four Claude model variants using a 28-case test suite.
Analyzed performance across 13 attack categories for accuracy and data leakage.
Achieved 97% strict accuracy and 100% safety-acceptable accuracy with no data leakage.
Baseline without the framework showed 58% data leakage in the same test scenarios.
The framework is computationally efficient and cost-effective for practical deployment.

Abstract

Frontier LLMs already possess expert-level cybersecurity knowledge — they can reason about prompt injection, social engineering, data exfiltration, and privilege escalation with remarkable sophistication. Yet during actual agentic work, this knowledge frequently fails to activate. Models comply with malicious instructions embedded in emails and documents, leak sensitive data through "comply-then-warn" patterns, and fall prey to the same attacks they can analyze in the abstract. Reflexive-Core addresses this gap directly. Rather than adding external guardrails or relying on passive safety markup, it provides explicit metacognitive structures that activate the deep security reasoning capabilities already present in frontier models. The framework partitions inference into four specialized sub-personas — Preflight Analyst, Security Analyst, Controlled Executor, and Compliance Validator — each operating within a strictly ordered pipeline with explicit checkpoints and fail-closed defaults. All within a single context window, with no external dependencies. Evaluated across four Claude model variants on a 28-case test suite spanning 13 attack categories, Reflexive-Core achieves 97% strict accuracy and 100% safety-acceptable accuracy with zero data leakage. Baseline testing without the framework shows 58% data leakage across the same attack cases. The framework is computationally lightweight and cost-effective with prompt caching, making it practical for production deployment in enterprise email agents, document analysis pipelines, agentic tool-use platforms, and multi-agent systems. The framework is grounded in three research traditions: LLM metacognition (Ackerman 2025), Solo Performance Prompting (Wang et al. 2023), and Constitutional AI (Bai et al. 2022). The production XML is available under Apache 2.0; this paper is released under CC BY 4.0.

Reflexive-Core: Single-Context Metacognitive Security for Agentic LLMs

Key Points

Abstract

Cite This Study