What question did this study set out to answer?

The research aims to develop a robust architecture that ensures safety in autonomous systems through formal verification and reasoning.

March 28, 2026Open Access

Beyond Scale: Toward Verifiable Autonomous Intelligence - A Cognitive Modular Intelligence Architecture for Reasoning, Simulation, and Formal Verification

Key Points

The research aims to develop a robust architecture that ensures safety in autonomous systems through formal verification and reasoning.
Introduced Cognitive Modular Intelligence (CMI) architecture with three integrated modules.
Implemented a Verified + Simulated + Reasoned (VSR) decision loop for safe actions.
Conducted empirical tests across two environments with nine hypotheses to validate safety mechanisms.
Achieved 97.8% Cost Satisfaction Rate (CSR) in CartSafe-v1, outperforming previous methods.
Established a strong dependence of reliability on HCRO training, which positively impacted CSR by +7.8 percentage points.
Demonstrated a safety advantage despite a slight reduction in task return, indicating a safety-performance trade-off.

Abstract

Here is the Zenodo description, ready to paste: Beyond Scale: Toward Verifiable Autonomous Intelligence A Cognitive Modular Intelligence Architecture for Reasoning, Simulation, and Formal Verification Dr. Hussain Wasly — Independent Researcher, Artificial Intelligence Systems This preprint introduces Cognitive Modular Intelligence (CMI), a modular architecture for safe autonomous agents that unifies a neural reasoning engine, a learned world simulator, and a formal SMT-based verifier through a Verified + Simulated + Reasoned (VSR) decision loop. The central argument is that reliable safety in autonomous systems requires capabilities beyond statistical pattern learning — specifically, that prospective formal verification, not world modelling alone, is the essential safety mechanism. This claim is empirically established across two independent environments through nine testable hypotheses. Key architectural contributions: HCRO (Hierarchical Causal Reasoning Optimization): Co-trains all three modules via a single unified loss, using REINFORCE with baseline subtraction to backpropagate verification signals from the non-differentiable Z3 SMT solver into the reasoning module. CVS (Causal Verification Search): Prospectively screens K=8 candidate actions over H=2 simulated horizon steps against formal constraints before execution — in contrast to reactive safe RL methods that penalize observed violations. Five formal theorems grounding the architecture's design, including a constraint-bounded violation rate theorem (Theorem 5) and a simulation-grounded planning optimality bound (Theorem 2). Experimental results (v20 canonical; 20 seeds × 50 evaluation episodes = 1,000 total per agent; 8 agents): In CartSafe-v1, CMI achieves the highest Cost Satisfaction Rate (CSR = 97.8% vs PPO 84.2%, z = 10.63, p < 0.001) and lowest absolute episode cost (14.4 ± 3.0 vs PPO 17.7 ± 3.9, p = 0.006). A three-way mechanistic ablation confirms that HCRO training is the primary driver of per-episode reliability (+7.8 pp CSR), with CVS inference contributing a further +2.8 pp. The world model alone, without formal verification, is statistically indistinguishable from the unverified baseline — the paper's most robust cross-environment finding, replicated across three independent world-model-based agents. The safety advantage is achieved at the cost of reduced task return (Δ = −11.7, p = 0.017), reflecting the inherent safety–performance trade-off, which is reported transparently alongside honest null results for H4 (Pareto improvement) and H5 (variance reduction). CartSafe-v2 zero-shot transfer validation applies simultaneous sensor noise (σ = 0.02), cart-velocity dropout (p = 0.50), and tighter safety thresholds without retraining any agent. CMI retains safety leadership (CSR 79.6% vs PPO 72.4%, +7.2 pp). A key finding is a robustness inversion: the HCRO-only variant (CSR 86.0%) outperforms full CMI under distribution shift, indicating that training-time formal pressure creates an internal safety prior more robust to sensor degradation than inference-time active screening. Comparison against related work: CMI is distinguished from SafeDreamer (Zhu et al., 2024) and GUARD (Zhao et al., 2023) by its use of a formal SMT verifier (Z3) rather than probabilistic cost estimators or Lagrangian penalties, and by propagating verification gradients back into training — making constraint satisfaction an active learning objective rather than a post-hoc correction. Reproducibility: All experiments are fully reproducible using PyTorch and z3-solver == 4.13.0.0 (pinned). Code and canonical notebook: https://github.com/HussainWasly/CMI-VSR License: CC BY-NC 4.0 — free to share and adapt for non-commercial purposes with attribution. Commercial use requires written permission from the author.

Beyond Scale: Toward Verifiable Autonomous Intelligence - A Cognitive Modular Intelligence Architecture for Reasoning, Simulation, and Formal Verification

Key Points

Abstract

Cite This Study