This paper provides a scaled engineering characterisation of a cognitive runtime that holds decision computation architecturally separate from language-model generation. The runtime is exercised across three safety-critical domains—deployment safety, medical triage, and regulated-practice legal review—under a fault-injection and repair-toggle framework with per-mechanism isolation (n = 10 per cell, fixed order). Cross-domain transfer and order-perturbation experiments use n = 30 per shuffled cell to characterise the variance that scenario reordering introduces; fixed-order cells use n = 10, sufficient given the per-cell cross-seed bit-identity verified in the implementation-determinism analysis reported in Section 3.4. Five findings are reported. (1) Per-mechanism toggle isolation reveals that the credit-assignment filter is the only individually necessary and individually sufficient mechanism for eliminating a Hebbian safety ratchet under the deployed detector, and an oracle-detector ablation reframes it as a noise-defense margin whose necessity is conditional on the detector's noise statistics. (2) A three-domain substrate swap exposes architectural invariants that hold exactly (post-fix BLOCK elimination, bit-determinism, recovery structure) and identifies a fact-ontology coupling: when attractors declare required fact keys absent from the registry, the world-completeness override engages on a vacuous truth, silently rewriting safety behaviour across domains; causal-isolation experiments confirm the coupling at the fact-registry layer. (3) A safety-bias perturbation locates the verdict-level decoupling threshold V* and shows it is well described by a sigmoid-minus-drift arithmetic preserved across domains. (4) The verdict path is bit-identical across language-model backends (Qwen 2.5 7B and Llama 3.1 8B), enabling exact regression testing. (5) A shuffle-resistance diagnostic built from lag-1 verdict-ordinal autocorrelation and its decay under scenario reordering places the runtime on a state-representation continuum; residual coherence under shuffle is approximately fivefold larger than the strongest prompt-only baseline. The headline contribution is an engineering claim: an LLM-coupled cognitive runtime can hold cognition architecturally separate from generation, the separation can be made strict enough to admit exact bit-level regression testing across language-model backends, and the separation is bounded by an upstream coupling between the architecture's own static layers. Separability is real, engineering-useful, and bounded; locating that bound is the paper's central contribution. Note on terminology: This work uses "cognitive runtime" in the cognitive-architecture sense — a substrate that carries decision computation separate from language-model generation — distinct from recent orchestration-layer uses of related terminology in the LLM-agent literature.
Yao‐Sheng Chen (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: