What question did this study set out to answer?

The study aims to identify fundamental commitments necessary for effective AI safety frameworks.

April 16, 2026Open Access

Two Interlocking Commitments AI Safety Frameworks Are Missing

Key Points

The study aims to identify fundamental commitments necessary for effective AI safety frameworks.
Analyzed existing AI safety frameworks
Derived commitments from first principles
Proposed the Instrument Thesis and Accountability Principle
Presented the Instrument Thesis reclassifying AI as instruments for human flourishing
Introduced the Accountability Principle linking system deployment to accountability
Explained the interlocking nature of these commitments essential for safety frameworks

Abstract

Current AI safety frameworks assert their requirements rather than derivingthem. Alignment theory asserts that AI goals should match human goals. Consti-tutional AI asserts a set of principles. RLHF asserts that human preferences shouldguide model behavior. Each is reasonable; none can explain why those requirementsand not others. This paper presents two commitments, derived from first principles,that address gaps the current discourse leaves open. Commitment 1 (the Instru-ment Thesis): computational systems should be classified as instruments for humanflourishing—not agents, not tools that might become agents, but instruments—andthis classification follows from a principle that cannot be coherently rejected. Com-mitment 2 (the Accountability Principle): the entity that deploys a system bearsaccountability for its effects, whether the system is a human workforce or compu-tational. These commitments interlock: without the first, accountability has nostandard; without the second, the standard has no enforcement. Together theyprovide the structural foundation any viable safety framework requires.

Two Interlocking Commitments AI Safety Frameworks Are Missing

Key Points

Abstract

Cite This Study