What question did this study set out to answer?

The aim is to evaluate the effectiveness of an active-inference agent compared to a standard memory-greedy baseline in non-stationary environments.

May 7, 2026Open Access

Nonstationary Battery

Key Points

The aim is to evaluate the effectiveness of an active-inference agent compared to a standard memory-greedy baseline in non-stationary environments.
Three falsification batteries were conducted with identical agent code and episode budgets.
Test environments included a tabular bandit with hidden-rule shifts, an antimicrobial-resistance scenario, and a sepsis environment.
Different discount settings for the bandit were applied (γ ∈ {0.90, 0.95, 0.99, 0.999}).
In the bandit environment, the agent underperformed by 7–15 percentage points across all settings.
In the AMR environment, the agent improved the post-shift optimal-treatment rate by 22.3 percentage points.
The agent performed best when diagnostic signals had high information content, as seen in the AMR results.

Abstract

Project Magha — Paper 1 We test an architectural prediction: that an active-inference agent built on a Beta-Bernoulli pairwise world model with multiplicative posterior decay should dominate a memory-greedy sliding-window baseline in non-stationary worlds — the regime where, by structural argument, posterior-driven exploration ought to outperform sample-mean greedy with epsilon-exploration. Three falsification batteries are run with identical agent code, identical seeds, and identical episode budgets: A procedurally generated tabular bandit with hidden-rule shifts at trials 100, 200, and 300. The agent loses by 7–15 percentage points across all four discount settings (γ ∈ 0. 90, 0. 95, 0. 99, 0. 999). An antimicrobial-resistance (AMR) environment with a mid-run geographic prevalence shift (southₐsia → northₐmerica; NDM-dominant → KPC-dominant). The agent wins by 22. 3pp on post-shift optimal-treatment rate. A sepsis environment with a mid-run patient-cohort shift (Gram-positive-leaning → Gram-negative-leaning). The agent versus memory-greedy. The architectural rule that emerges is narrower than the original prediction: the agent wins only in regimes where per-instance diagnostic signals carry high information about the latent state. The AMR win is driven not by world-model adaptation across the shift, but by per-isolate test-information access — Magha drops 5. 7pp pre→post-shift while still beating the memory-greedy baseline by a wide margin. On the bandit (no per-instance test channel) and on sepsis (low-information vital-signs tests), the categorical Beta-Bernoulli world model has no structural advantage. The headline implication: this architecture is closer to a diagnostic decision-support architecture than to a general Turing-ASI substrate. We document the falsification because it is load-bearing for every subsequent paper in the series — particularly the gap-1 hypothesis that replacing the categorical pairwise world model with a learned hierarchical neural one delivers calibration gains and sample-efficiency wins. This is paper 1 in the Project Magha series, an empirical research notebook documenting the path toward a Turing-shaped child-machine via the Era-of-Experience loop (Sutton & Silver 2024) with active inference (Friston / Da Costa / Parr) as the efficiency layer.

Nonstationary Battery

Key Points

Abstract

Cite This Study