What question did this study set out to answer?

This research explores how preference-based methods in AI alignment are grounded in non-ideal approaches to justice and their implications for moral progress.

May 11, 2026Open Access

Non-Ideal Foundations for Preference-Based AI Alignment

CPCameron PattisonVanderbilt University

Key Points

This research explores how preference-based methods in AI alignment are grounded in non-ideal approaches to justice and their implications for moral progress.
Analyzed preference-based AI alignment practices, focusing on RLHF
Evaluated the criticisms of normatively thin approaches
Demonstrated how non-ideal frameworks address complexity and pluralism in ethical considerations.
Preference-based AI alignment shows measurable improvements by addressing manifest harms directly.
Critics' claims regarding the necessity of ideal justice for progress encounter significant theoretical challenges.
Non-ideal methods reveal strengths in promoting moral progress that ideal theories do not account for.

Abstract

Abstract Contemporary AI alignment relies on preference-based methods, especially RLHF, that critics deem normatively thin. I argue these methods are best understood as normatively grounded in non-ideal approaches to justice: they prioritize comparative judgments, harm reduction, and iterative revision without first characterizing what ideal justice would require. Read that way, the very features taken as defects—lack of fixed principles, local trade-off judgments, continual updating—are strengths for steering moral progress under complexity and pluralism. Using alignment practice as a test case, I show that the priority claims ideal theorists make—that characterizing the ideal is necessary for measuring progress and guiding reform—face second-best and redundancy problems, whereas non-ideal methods produce measurable improvements by directly targeting manifest harms. The upshot is twofold: for AI ethics, preference-based alignment has a robust normative foundation; for political philosophy, alignment provides rare, high-stakes evidence against the priority of ideal theory.

KI fragen

Bookmark

View Full Paper