Asimov’s Laws of Robotics are widely invoked as the canonical statement of robot safety, yet they have resisted implementation because their central terms — harm, inaction, human, obedience — are unquantified and context dependent. This paper argues that the Laws should be read not as a reward specification to be optimized but as a hierarchy of safety guardrails to be enforced, in direct analogy to the guardrails that reduced the risk profile of large language models without ever achieving completeness. We formalize the Laws as a lexicographically constrained Markov decision process and propose a concrete, buildable architecture — a shield-and-deference loop — that an industry can adopt as a common safety substrate. We present the architecture with explicit trust boundaries, then specify its three layers: (i) a non-learned shield, built from control barrier functions and reachability analysis, that enforces a hard, offline-verifiable floor over the specifiable subset of harm; (ii) a deference layer that maintains a Bayesian posterior over an unspecifiable harm-cost function and, by a loss-asymmetry argument, treats halting or querying a human as the optimal action whenever tail risk is high; and (iii) a budgeted query policy that selects which questions are worth a human’s attention by maximizing a decision-theoretic value of information under a shadow price on interruptions. We give a residual-risk decomposition with three auditable design knobs and show how the architecture maps onto the First, Second, Third, and Zeroth Laws. We do not claim to prevent all harm; we claim a principled, verifiable reduction of it, and we are explicit about the failure modes that remain.
Igor Chizhov (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: