What question did this study set out to answer?

The aim is to define agency through a structural account based on mutual surprisal and its implications across various systems.

May 13, 2026Open Access

Agency as a Self-Closing Loop: Mutual Surprisal and the Optimization Gap

Key Points

The aim is to define agency through a structural account based on mutual surprisal and its implications across various systems.
Enumerative analysis of systems like RNA, bacteria, Aplysia, and humans; exclusion of thermostats and current LLMs.
Operationalization of causal emergence and temporal scale of dynamical dependence via validated measurement methods.
Speculative conjectures A and B are proposed, focusing on structural conditions and gap-monitoring across domains.
Identifying a common structural pattern in reinforcement learning, predictive coding, and control theory with respect to agency.
Conjecture A suggests that structural conditions for agency may extend to AI, with implications for capability-refusal correlation.
Proposed conjecture B implies non-selectivity in gap-monitoring, impacting architectural features in AI systems.

Abstract

What is an agent? This paper offers a candidate structural account: agents are systems sustaining mutual surprisal across a self-closing causal loop on its own closure timescale. The account is enumerated rather than universal — RNA, bacteria, Aplysia, and humans are in; thermostats, tornados, and most current LLMs are out — and extended beyond the enumerated cases by a named conjecture (Conjecture A). Universality is not claimed; universality without a state space cannot be cashed. The selection is theory-laden: the four in-cases share other properties (metabolic closure, autopoietic organization, thermodynamic openness) any of which a different framework could elevate. The case for mutual surprisal rests on what follows from it, not on the cases forcing it. What follows is a cross-framework pattern. Reinforcement learning, the free energy principle, predictive coding, active inference, and control theory share a minimization-shape objective whose bare optimum coincides with collapse of mutual surprisal across the loop; their framework-specific machinery performs structurally similar work in preventing the bare optimum. Reward hacking, mode collapse, hallucination, and dark-room dynamics are proposed as slices of this single structural pattern — a reading independently circled by the Proxy Compression Hypothesis (Wang et al. , 2026) and the mesa-optimization framework (Hubinger et al. , 2019). The operationalization: τc is identified with the temporal scale at which dynamical dependence (Barnett and Seth, 2023) across the agent-environment partition is minimized. The identification inherits validated measurement methods from the causal-emergence tradition (Hoel and collaborators). Two named conjectures concentrate the speculative content. Conjecture A (extension): the structural object extends to other systems satisfying the structural conditions. Conjecture B (content-non-selectivity): gap-monitoring oriented at proxy-requirement decoupling cannot be selectively oriented across content domains. A and B carry explicit falsification conditions in the body and are independent. If A holds in the AI direction, a capability-refusal correlation in deployed AI is predicted; if B additionally holds, the correlation is architectural. The AI extension is the most accessible test site for A. The framework is loop-prior and information-first, in the Walker-Davies-Pattee tradition; the contribution is synthetic. Keywords: agency, agent-environment coupling, dynamical independence, causal emergence, mutual information, reward hacking, AI alignment, mesa-optimization, free energy principle

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper