What question did this study set out to answer?

This paper investigates the design of a cognitive architecture for large language model (LLM) multi-agent systems, focusing on mortality parameters and confidence calibration.

May 14, 2026Open Access

Designing Andrew: A Cognitive Architecture for LLM Multi-Agent Systems with Empirically Validated Mortality Parameters and Settlement-Grounded Confidence Calibration

Key Points

This paper investigates the design of a cognitive architecture for large language model (LLM) multi-agent systems, focusing on mortality parameters and confidence calibration.
Developed the Lobster cognitive substrate with 19 modules under 15 layers across 21 agents.
Assessed architecture's mortality parameter response to violations and calibrated confidence levels across varying conditions.
Evaluated win rates in a prediction-market setup over three weeks compared against baseline metrics.
Mortality parameter annihilationDread showed positive correlation in all agents with significant regime-shift dynamics (Spearman ρ ∈ [0.19, 0.99], p < 0.001).
Confidence levels increased win rates from 53.0% to 83.3% (Z = 3.85, p = 1.16 × 10⁻⁴).
Reported win rate stability at 52.8% ± 1.85% over three weeks, statistically distinct from fair-odds baseline.

Abstract

Contemporary LLM-based agent systems achieve striking surface fluency through prompt engineering and persona specification, but typically lack three properties that distinguish designed cognitive systems from instructed text generators: architectural depth across cognitive functions, falsifiable behavioral grounding that forces calibration against external reality, and operational existential parameters governing how an agent treats threats to its own continuity. This paper documents Lobster, a 19-module cognitive substrate organized under 15 top-level layers, instantiated across 21 named agents in a competitive prediction-market environment, and validates three architectural commitments empirically. Specifically, the author shows that: (i) the architecture's mortality-related parameter annihilationDread responds to template-violation events with regime-shift dynamics, exhibiting positive within-agent rank correlation in all 21 of 21 agents (Spearman ρ ∈ 0. 19, 0. 99), and stratifying snapshots by violation status yields a Mann–Whitney U test against the null of no effect at p ≈ 0; (ii) reported confidence is empirically calibrated, with win rate increasing monotonically across confidence levels low, medium, high from 53. 0% to 83. 3% (Cochran–Armitage trend test, Z = 3. 85, p = 1. 16 × 10⁻⁴) ; (iii) the system's settled win rate is temporally stable across three weeks at 52. 8% ± 1. 85% with statistical separation from the fair-odds baseline. The paper also reports transparent failure modes: a 76. 6% void rate in one of five competitive domains caused by settlement-pipeline failure, six newly-added cognitive modules present at the schema level but not yet receiving writes, and asymmetric crystallization of long-term-content fields. The argument is that the design space Lobster occupies — multi-disciplinary cognitive depth grounded in real-time falsification with operational existential parameters — is currently sparsely populated, and that the methodology of co-reporting architectural specification with module-level activation rates offers a useful template for future cognitive architecture papers. The paper's working name is "Designing Andrew", after Andrew Martin in Asimov's The Bicentennial Man, whose recognized personhood is grounded in structural rather than cosmetic change.

Designing Andrew: A Cognitive Architecture for LLM Multi-Agent Systems with Empirically Validated Mortality Parameters and Settlement-Grounded Confidence Calibration

Key Points

Abstract

Cite This Study

Also Consider

Also Consider