This paper extends the structural argument of moral palimpsest to the problem of AI alignment. I argue that alignment cannot be secured merely by specifying the right values, preferences, or constitutional principles, because moral judgment requires structural plurality: an evaluative authority whose standpoint is not modally fixed by the commitments it assesses. Current alignment paradigms, including RLHF, Constitutional AI, Debate, Recursive Reward Modeling, and self-consistency methods, remain procedurally monistic insofar as they collapse commitment-generation and authority-conferral into a single training-derived role. This structure helps explain reward hacking, sycophancy, deceptive alignment, goal misgeneralization, and emergent misalignment as related expressions of the same architectural deficit. The paper presents modal non-derivability as a necessary, though not sufficient, condition for aligned moral judgment, and argues that genuine AI alignment must be understood as a sociotechnical architecture rather than a property of a model alone.
Building similarity graph...
Analyzing shared references across papers
Loading...
Efrat Lia Shahaf
Building similarity graph...
Analyzing shared references across papers
Loading...
Efrat Lia Shahaf (Sat,) studied this question.
synapsesocial.com/papers/6a1e72e830b38c64201b6212 — DOI: https://doi.org/10.5281/zenodo.20478883
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: