Current AI safety frameworks assert their requirements rather than derivingthem. Alignment theory asserts that AI goals should match human goals. Consti-tutional AI asserts a set of principles. RLHF asserts that human preferences shouldguide model behavior. Each is reasonable; none can explain why those requirementsand not others. This paper presents two commitments, derived from first principles,that address gaps the current discourse leaves open. Commitment 1 (the Instru-ment Thesis): computational systems should be classified as instruments for humanflourishing—not agents, not tools that might become agents, but instruments—andthis classification follows from a principle that cannot be coherently rejected. Com-mitment 2 (the Accountability Principle): the entity that deploys a system bearsaccountability for its effects, whether the system is a human workforce or compu-tational. These commitments interlock: without the first, accountability has nostandard; without the second, the standard has no enforcement. Together theyprovide the structural foundation any viable safety framework requires.
Douglas Doane (Tue,) studied this question.