Abstract Contemporary AI alignment relies on preference-based methods, especially RLHF, that critics deem normatively thin. I argue these methods are best understood as normatively grounded in non-ideal approaches to justice: they prioritize comparative judgments, harm reduction, and iterative revision without first characterizing what ideal justice would require. Read that way, the very features taken as defects—lack of fixed principles, local trade-off judgments, continual updating—are strengths for steering moral progress under complexity and pluralism. Using alignment practice as a test case, I show that the priority claims ideal theorists make—that characterizing the ideal is necessary for measuring progress and guiding reform—face second-best and redundancy problems, whereas non-ideal methods produce measurable improvements by directly targeting manifest harms. The upshot is twofold: for AI ethics, preference-based alignment has a robust normative foundation; for political philosophy, alignment provides rare, high-stakes evidence against the priority of ideal theory.
Building similarity graph...
Analyzing shared references across papers
Loading...
Cameron Pattison
Vanderbilt University
Philosophy & Technology
Vanderbilt University
Building similarity graph...
Analyzing shared references across papers
Loading...
Cameron Pattison (Sat,) studied this question.
synapsesocial.com/papers/6a0171ed3a9f334c2827203b — DOI: https://doi.org/10.1007/s13347-026-01102-8