This paper extends the structural argument of moral palimpsest to the problem of AI alignment. I argue that alignment cannot be secured merely by specifying the right values, preferences, or constitutional principles, because moral judgment requires structural plurality: an evaluative authority whose standpoint is not modally fixed by the commitments it assesses. Current alignment paradigms, including RLHF, Constitutional AI, Debate, Recursive Reward Modeling, and self-consistency methods, remain procedurally monistic insofar as they collapse commitment-generation and authority-conferral into a single training-derived role. This structure helps explain reward hacking, sycophancy, deceptive alignment, goal misgeneralization, and emergent misalignment as related expressions of the same architectural deficit. The paper presents modal non-derivability as a necessary, though not sufficient, condition for aligned moral judgment, and argues that genuine AI alignment must be understood as a sociotechnical architecture rather than a property of a model alone.
Efrat Lia Shahaf (Sat,) studied this question.