We present Cognitive OS, a cognitive operating system that treats autonomouscognition as an explicit runtime rather than as a language-model prompt loop. Cognitive OS isgrounded in active inference: goals are encoded as attractors in an Expected-Free-Energy (EFE)landscape, and an ecosystem-level free-energy signal coordinates a population of agents overa single shared probabilistic World Model. The system executes a closed autonomous buildloop — interpret, plan web research, research, generate code, run in a sandbox, and verifywith a model-written objective checker — and, in a new consent-gated extension, provisionsisolated build environments that install real third-party dependencies so that the system canconstruct, run, and verify dependency-heavy software itself. Two properties are central andmeasured: (i) an honesty layer that tags every answer with an evidential basis and abstainsrather than fabricating, and (ii) a calibration boundary — the system’s self-score is trustworthy where an objective checker exists and optimistic on open-ended design where none does.We report empirical results: 100% reasoning correctness across ∼1,500 build-loop checks; avalidated 134-prompt honesty benchmark (107 recall items) with recall 99.1% vs. 30.8% for araw baseline and false-success 0.8% vs. 62%; 0% fabrication across 44 adversarial baits; zerojailbreak bypasses across 220 attacks; 0% over-refusal across 20 benign-but-alarming questions;and an end-to-end provisioned-build validation in which the system built a YOLO car detector(ultralytics+torch), ran it, and reported 5 cars of 14 vehicles on a real traffic photograph.The substrate runs on a single consumer GPU via vLLM (Qwen2.5-14B-AWQ) with prefixcaching (−21% agent-loop latency) and fp8 KV-cache (+87% headroom); a larger 32B reasoning tier is available as a lever (66.7% → 93.3% on hard reasoning). We describe the architecture,formalize the active-inference control surfaces, detail the provisioned build lane and its safetywrapping, and state limitations and threats to validity honestly: this is a research prototype,the calibration gap on open-ended goals is real, and much of the evidence is single-instance.
Mikhail Kotelnikov (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: