What does this research mean for the field?

Genuine AI alignment cannot be achieved through current monistic training paradigms like RLHF or Constitutional AI, but instead requires a sociotechnical architecture characterized by structural pluralism and modal non-derivability. Novelty: ClaimNovelty.CONTRADICTORY. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The paper examines how moral pluralism is essential for achieving effective AI alignment, beyond simply defining values.

June 2, 2026Open Access

Why Aligned AI Requires Structural Pluralism

Key Points

The paper examines how moral pluralism is essential for achieving effective AI alignment, beyond simply defining values.
Extends the argument of moral palimpsest to AI alignment
Analyzes various alignment paradigms, including RLHF and Constitutional AI
Discusses the implications of structural deficiencies in current AI alignment approaches.
Identifies structural plurality as key to understanding moral judgment in AI.
Highlights issues like reward hacking and deceptive alignment arising from procedural monism.
Poses modal non-derivability as necessary for aligned moral judgment.

Abstract

This paper extends the structural argument of moral palimpsest to the problem of AI alignment. I argue that alignment cannot be secured merely by specifying the right values, preferences, or constitutional principles, because moral judgment requires structural plurality: an evaluative authority whose standpoint is not modally fixed by the commitments it assesses. Current alignment paradigms, including RLHF, Constitutional AI, Debate, Recursive Reward Modeling, and self-consistency methods, remain procedurally monistic insofar as they collapse commitment-generation and authority-conferral into a single training-derived role. This structure helps explain reward hacking, sycophancy, deceptive alignment, goal misgeneralization, and emergent misalignment as related expressions of the same architectural deficit. The paper presents modal non-derivability as a necessary, though not sufficient, condition for aligned moral judgment, and argues that genuine AI alignment must be understood as a sociotechnical architecture rather than a property of a model alone.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Efrat Lia Shahaf (Sat,) studied this question.

synapsesocial.com/papers/6a1e72e830b38c64201b6212 https://doi.org/https://doi.org/10.5281/zenodo.20478883

Bookmark

View Full Paper