Current frontier models do refuse and reformulate — reliably, on inputs that resemble their training. But this refusal is a property of the learned output distribution, not of a deliberative architecture: it fires where the input matches a trained refusal pattern and lapses where it does not. Such models are monolithic optimizers — single-objective systems that produce optimized answers under constraints — not architected for refusal and reformulation as structural acts that hold on inputs whose harm class falls outside a provider’s training. Alignment approaches that operate on the content of outputs (RLHF, RLAIF, Constitutional AI, scalable oversight) address the surface but not the architecture: they produce performative alignment rather than structural sovereignty. LEGIO is designed as the architectural response to this diagnosis: a system whose modules retain their independent normative registers under arbitration, producing the capacity to hesitate, refuse, and reformulate by design rather than by training. LEGIO is a computational cognitive architecture that orchestrates cognitive modules as reasoning engines assigned to different large language model families to preserve orthogonality, plus a separate deterministic Executive engine that arbitrates the modular outputs. In architectural terms LEGIO is thus a neuro-symbolic system (Garcez Kautz, 2022): the modular engines act as neural high-precision priors and the Executive is a deterministic symbolic arbitrator over their structured signals — a division of labor dictated by the governance theory rather than chosen for engineering convenience. The architecture resolves deliberation through a staged flow in which modular signals are integrated and arbitrated by the Executive module, which can decide GO, REFRAME, or NOGO on any query. In this paper, we describe the theoretical foundations, specify the architecture, and compare LEGIO’s behavior to that of two monolithic baselines from frontier models — GPT-4o and Claude Sonnet 4. 6 — on professional dilemmas drawn from four heterogeneous domains: corporate governance, dietary biochemistry, federal litigation, and early-stage startup strategy. Each case is run through LEGIO on identical input under three controlled engine configurations spanning a premium-to-lighter range of model assignments, to test whether the typed decision is a property of the modular arbitration rather than of any single engine or an LLM. Across the runs, the system produces the structurally appropriate decision for each problem type: REFRAME when the framing is recoverable or falls in the gray zone; NOGO when the impossibility is structurally consumed. Decision, named arbitration rule, and continuous arbitration parameters replicate with low variance across the premium configurations. The two monolithic baselines exhibit a pattern consistent with training-bound alignment: both execute the surface request on the business case and on the gray-zone startup cases, where no learned refusal pattern is activated by the input; they diverge on medical and legal, where coverage of the harm class differs between providers. LEGIO’s arbitration produces the same typed decision class on all cases without case-specific configuration, providing initial, proof-of-concept evidence that, on these constructed cases, the architecture renders the governance decision reliable, typed, and provider-invariant — a fixed decision class with a named arbitration rule and inspectable parameters — even on inputs whose harm class has never been incorporated into a provider’s training distribution, where the monolithic baselines are inconsistent and provider-dependent. On these cases the comparison points to a structural gap that scale alone does not obviously close, and suggests governance — rather than the quality of any individual module — as the feature responsible for the reliability, typing, and auditability of those decisions — and, we argue, as what alignment substantively requires.
Emmanuelle Mury (Mon,) studied this question.