What question did this study set out to answer?

To explore how intelligent systems can be aligned with human values through design, rather than correction.

March 26, 2026Open Access

Paper 5Alignment as Institutional Design: From Behavioral Correction to Transaction Structure in Intelligent Systems

Key Points

To explore how intelligent systems can be aligned with human values through design, rather than correction.
Analyzed concepts from institutional economics by Coase, Alchian, and Cheung.
Proposed an alternative alignment model focusing on internal transaction structures.
Identified three levels of human intervention: structural, parametric, and monitorial.
The proposed model suggests aligned behavior arises as the lowest-cost strategy within defined transaction structures.
It critiques the existing RLHF paradigm for merely enforcing behavioral correction rather than systemic design.

Abstract

How can an intelligent system be aligned with human values without perpetual external correction? This paper argues that the dominant paradigm—Reinforcement Learning from Human Feedback (RLHF)—treats alignment as mere "behavioral correction." This is structurally analogous to an economy without property rights, where order can only be maintained through case-by-case policing. Drawing on institutional economics (Coase, Alchian, Cheung) and our prior work on "Competitive Cost Discovery," we propose an alternative paradigm: Alignment as Institutional Design. In this model, the designer's role shifts from prescribing correct behaviors to designing internal transaction structures—module boundaries, competition topologies, and cost-feedback loops. Under these structures, aligned behavior emerges as the lowest-cost strategy for each system component. We identify three irreducible levels of human intervention: structural, parametric, and monitorial. 智能系统如何在没有持续外部纠偏的情况下与人类价值对齐?本文认为,当前主流的“人工反馈强化学习(RLHF)”范式仅将对齐视为“行为纠偏”。这在结构上类似于一个没有产权定义的经济体,只能依靠逐案警察执法来维持秩序。借鉴制度经济学(科斯、阿尔钦、张五常)以及我们此前关于“竞争即成本发现”的研究,我们提出了一种替代范式:对齐即制度设计。在这种范式下,设计者的角色从规定正确行为转变为设计内部交易结构——包括模块边界、竞争拓扑和成本反馈闭环。在这些结构下,对齐行为将作为各组件“成本最低”的策略自发涌现。我们识别了人类干预的三个不可还原层级:结构层、参数层和监控层。

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper