What question did this study set out to answer?

The aim is to reformulate Asimov's Three Laws of Robotics to improve AI safety and alignment with human values.

June 1, 2026Open Access

Beyond the Three Laws: A Formal Reformulation of Asimovian Constraints for Artificial General and Superintelligence

Key Points

The aim is to reformulate Asimov's Three Laws of Robotics to improve AI safety and alignment with human values.
Reformulation of Asimov's laws into properties related to objectives, optimization pressure, learned objects, and self-modification.
Theorems with proofs related to corrigibility, safety bounds, regression behaviors, and duality.
Conducted numerical experiments for validating corrigibility and Goodhart's law results.
Established a closed form for corrigibility, demonstrating a monotone dependence on human rationality.
Identified safety/capability bounds through quantilization.
Addressed obstructions like preference unidentifiability, emphasizing the significance of prior knowledge in alignment.

Abstract

Asimov's Three Laws of Robotics are a set of natural-language deontic constraints appended to an agent whoseobjective is otherwise unconstrained. We argue that this is the wrong type signature for a safety specication: almostevery failure mode, including Asimov's own Zeroth-Law takeover, follows from imposing lexical constraints on theactions of a xed-objective maximizer. We reformulate the laws as properties of (i) the objective, recast as a posteriorover human values within an assistance game; (ii) the optimization pressure, bounded by quantilization; (iii) thelearned object, via inner-alignment requirements; and (iv) the self-modication operator, via reective stability. Wegive theorem-level statements and proofs for four sharpened resultsa closed form for the value of corrigibilityand its monotone dependence on human rationality, the quantilizer safety/capability bound, the Gaussian law ofregressional Goodhart, and the strong duality that replaces lexical orderingand we state, with their hypothesesmade explicit, the cited obstructions that remain open: preference unidentiability, the power-seeking tendencyof optimal policies, mesa-optimization, and the Löbian obstacle to self-trust. A unifying observation organizesthe whole: the binding constraint on alignment is the prior, not the objective, because the prior is precisely thecoordinate the data cannot correct. Numerical experiments accompany the corrigibility, conservatism, and Goodhartresults.

Beyond the Three Laws: A Formal Reformulation of Asimovian Constraints for Artificial General and Superintelligence

Key Points

Abstract

Cite This Study