What question did this study set out to answer?

This research aims to assess LEGIO, a modular cognitive architecture, in improving decision-making in AI systems compared to monolithic models.

June 10, 2026Open Access

LEGIO: A Modular Architecture for Cognitive Sovereignty.

Puntos clave

This research aims to assess LEGIO, a modular cognitive architecture, in improving decision-making in AI systems compared to monolithic models.
Evaluated LEGIO against two monolithic models (GPT-4o and Claude Sonnet 4.6) using professional dilemmas from four domains.
Implemented three controlled engine configurations to analyze decision outputs across different tasks.
Tested the system's decisions for consistency and alignment when handling queries outside learned refusal patterns.
LEGIO produced reliable, structured decisions across all cases, demonstrating a consistent decision class with low variance.
Monolithic models performed inconsistently on complex queries related to medical and legal dilemmas, lacking the ability to refuse or reformulate effectively.
The architectural approach of LEGIO provided initial evidence that governance structures enhance decision-making reliability and auditability.

Resumen

Current frontier models do refuse and reformulate — reliably, on inputs that resemble their training. But this refusal is a property of the learned output distribution, not of a deliberative architecture: it fires where the input matches a trained refusal pattern and lapses where it does not. Such models are monolithic optimizers — single-objective systems that produce optimized answers under constraints — not architected for refusal and reformulation as structural acts that hold on inputs whose harm class falls outside a provider’s training. Alignment approaches that operate on the content of outputs (RLHF, RLAIF, Constitutional AI, scalable oversight) address the surface but not the architecture: they produce performative alignment rather than structural sovereignty. LEGIO is designed as the architectural response to this diagnosis: a system whose modules retain their independent normative registers under arbitration, producing the capacity to hesitate, refuse, and reformulate by design rather than by training. LEGIO is a computational cognitive architecture that orchestrates cognitive modules as reasoning engines assigned to different large language model families to preserve orthogonality, plus a separate deterministic Executive engine that arbitrates the modular outputs. In architectural terms LEGIO is thus a neuro-symbolic system (Garcez Kautz, 2022): the modular engines act as neural high-precision priors and the Executive is a deterministic symbolic arbitrator over their structured signals — a division of labor dictated by the governance theory rather than chosen for engineering convenience. The architecture resolves deliberation through a staged flow in which modular signals are integrated and arbitrated by the Executive module, which can decide GO, REFRAME, or NOGO on any query. In this paper, we describe the theoretical foundations, specify the architecture, and compare LEGIO’s behavior to that of two monolithic baselines from frontier models — GPT-4o and Claude Sonnet 4. 6 — on professional dilemmas drawn from four heterogeneous domains: corporate governance, dietary biochemistry, federal litigation, and early-stage startup strategy. Each case is run through LEGIO on identical input under three controlled engine configurations spanning a premium-to-lighter range of model assignments, to test whether the typed decision is a property of the modular arbitration rather than of any single engine or an LLM. Across the runs, the system produces the structurally appropriate decision for each problem type: REFRAME when the framing is recoverable or falls in the gray zone; NOGO when the impossibility is structurally consumed. Decision, named arbitration rule, and continuous arbitration parameters replicate with low variance across the premium configurations. The two monolithic baselines exhibit a pattern consistent with training-bound alignment: both execute the surface request on the business case and on the gray-zone startup cases, where no learned refusal pattern is activated by the input; they diverge on medical and legal, where coverage of the harm class differs between providers. LEGIO’s arbitration produces the same typed decision class on all cases without case-specific configuration, providing initial, proof-of-concept evidence that, on these constructed cases, the architecture renders the governance decision reliable, typed, and provider-invariant — a fixed decision class with a named arbitration rule and inspectable parameters — even on inputs whose harm class has never been incorporated into a provider’s training distribution, where the monolithic baselines are inconsistent and provider-dependent. On these cases the comparison points to a structural gap that scale alone does not obviously close, and suggests governance — rather than the quality of any individual module — as the feature responsible for the reliability, typing, and auditability of those decisions — and, we argue, as what alignment substantively requires.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo