Asimov's Three Laws of Robotics are a set of natural-language deontic constraints appended to an agent whoseobjective is otherwise unconstrained. We argue that this is the wrong type signature for a safety specication: almostevery failure mode, including Asimov's own Zeroth-Law takeover, follows from imposing lexical constraints on theactions of a xed-objective maximizer. We reformulate the laws as properties of (i) the objective, recast as a posteriorover human values within an assistance game; (ii) the optimization pressure, bounded by quantilization; (iii) thelearned object, via inner-alignment requirements; and (iv) the self-modication operator, via reective stability. Wegive theorem-level statements and proofs for four sharpened resultsa closed form for the value of corrigibilityand its monotone dependence on human rationality, the quantilizer safety/capability bound, the Gaussian law ofregressional Goodhart, and the strong duality that replaces lexical orderingand we state, with their hypothesesmade explicit, the cited obstructions that remain open: preference unidentiability, the power-seeking tendencyof optimal policies, mesa-optimization, and the Löbian obstacle to self-trust. A unifying observation organizesthe whole: the binding constraint on alignment is the prior, not the objective, because the prior is precisely thecoordinate the data cannot correct. Numerical experiments accompany the corrigibility, conservatism, and Goodhartresults.
Alfredo Sepulveda-Jimenez (Sat,) studied this question.