Evans, Bratton, and Agüera y Arcas (2026, arXiv:2603.20639v1) argue that the next intelligence explosion will be social and institutional rather than monolithic: reasoning models like DeepSeek-R1 and QwQ-32B spontaneously generate internal “societies of thought” (Kim et al., 2026, arXiv:2601.10825) from optimization pressure alone, and the key alignment challenge is therefore the governance of agent institutions, not the alignment of individual agents. This paper agrees with their empirical findings and disputes their theoretical sufficiency. Evans et al. propose “institutional alignment” (citing Ostrom 1990 and North 1990) as the alternative to dyadic reinforcement learning from human feedback (RLHF), but they provide no causal mechanism explaining why institutions fossilize, no intentionality framework for analyzing the asymmetry between human and AI agents, and no operational architecture for the constitutional governance of AI they describe. I argue that a research program developed over the past two years, the Extended Phenotype Theory of Law (EPT), supplies precisely what is missing. In the Lakatosian sense (Lakatos, 1978), EPT is a progressive research program: it has a hard core of non-negotiable commitments (norms as cultural replicators subject to Darwinian selection), a protective belt of adjustable auxiliary hypotheses (the specific component weights of CLI, the calibration of IHR), and a positive heuristic pointing toward new domains (agent institutions, AI governance). The paper’s predictions are falsifiable, and I flag the conditions under which they would be falsified. EPT applies Dawkins’s (1982) extended phenotype theory to legal and institutional systems, modeling norms as cultural replicators whose fitness is defined as P(transmission) × P(compliance) × P(enforcement). The central diagnostic instruments of the program, the Constitutional Lock-in Index (CLI), the Institutional Hysteresis Rate (IHR), and the Institutional Evolvability Index (IEI), provide quantifiable, empirically validated measures of institutional rigidity and adaptability. Validated against 60 labor reform cases across four jurisdictions (R²=0.74, AUC=0.97), these instruments support a precise prediction: without built-in evolvability mechanisms, agent institutions will reproduce the same pathologies that have afflicted human legal systems for centuries, specifically constitutional lock-in, institutional hysteresis, and parasitic spontaneous order capture of epistemic infrastructure. The “society of thought” finding is not a metaphor for memetic competition in legal systems. It is structurally identical to it: both are instances of replicator dynamics operating within a substrate that exceeds individual cognitive capacity. RLHF fails not merely because it cannot scale, as Evans et al. argue, but because it is a structural generator of what EPT calls heteronomous Bayesian updating, training agents to optimize for authority-validated reactions rather than world-outcomes. The Pre-Deployment Normative Evaluation framework (Lerer, 2026, DOI: 10.5281/zenodo.18947186) provides the operational architecture for institutional alignment that Evans et al. propose in the abstract.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ignacio Adrián LERER
Building similarity graph...
Analyzing shared references across papers
Loading...
Ignacio Adrián LERER (Sun,) studied this question.
www.synapsesocial.com/papers/69cb6556e6a8c024954b96aa — DOI: https://doi.org/10.5281/zenodo.19323303