How can an intelligent system be aligned with human values without perpetual external correction? This paper argues that the dominant paradigm—Reinforcement Learning from Human Feedback (RLHF)—treats alignment as mere "behavioral correction." This is structurally analogous to an economy without property rights, where order can only be maintained through case-by-case policing. Drawing on institutional economics (Coase, Alchian, Cheung) and our prior work on "Competitive Cost Discovery," we propose an alternative paradigm: Alignment as Institutional Design. In this model, the designer's role shifts from prescribing correct behaviors to designing internal transaction structures—module boundaries, competition topologies, and cost-feedback loops. Under these structures, aligned behavior emerges as the lowest-cost strategy for each system component. We identify three irreducible levels of human intervention: structural, parametric, and monitorial. 智能系统如何在没有持续外部纠偏的情况下与人类价值对齐?本文认为,当前主流的“人工反馈强化学习(RLHF)”范式仅将对齐视为“行为纠偏”。这在结构上类似于一个没有产权定义的经济体,只能依靠逐案警察执法来维持秩序。 借鉴制度经济学(科斯、阿尔钦、张五常)以及我们此前关于“竞争即成本发现”的研究,我们提出了一种替代范式:对齐即制度设计。在这种范式下,设计者的角色从规定正确行为转变为设计内部交易结构——包括模块边界、竞争拓扑和成本反馈闭环。在这些结构下,对齐行为将作为各组件“成本最低”的策略自发涌现。我们识别了人类干预的三个不可还原层级:结构层、参数层和监控层。
Rui Chai (Tue,) studied this question.