What question did this study set out to answer?

The aim is to formalize Asimov's Laws of Robotics into a usable safety architecture for human-robot interactions.

June 22, 2026Open Access

Algorithmic Guardrails for Human-Interacting Robots: A Shield-and-Deference Formalization of the Laws of Robotics

Key Points

The aim is to formalize Asimov's Laws of Robotics into a usable safety architecture for human-robot interactions.
Introduced a shield-and-deference loop architecture based on a Markov decision process.
Developed a non-learned shield using control barrier functions and reachability analysis.
Specified a budgeted query policy for selecting human interactions based on decision-theoretic principles.
Demonstrated a verifiable reduction in harm through the proposed architecture.
Mapped the architecture effectively onto Asimov's Laws, showing clear design options.
Identified explicit failure modes while emphasizing the reduction of residual risk.

Abstract

Asimov’s Laws of Robotics are widely invoked as the canonical statement of robot safety, yet they have resisted implementation because their central terms — harm, inaction, human, obedience — are unquantified and context dependent. This paper argues that the Laws should be read not as a reward specification to be optimized but as a hierarchy of safety guardrails to be enforced, in direct analogy to the guardrails that reduced the risk profile of large language models without ever achieving completeness. We formalize the Laws as a lexicographically constrained Markov decision process and propose a concrete, buildable architecture — a shield-and-deference loop — that an industry can adopt as a common safety substrate. We present the architecture with explicit trust boundaries, then specify its three layers: (i) a non-learned shield, built from control barrier functions and reachability analysis, that enforces a hard, offline-verifiable floor over the specifiable subset of harm; (ii) a deference layer that maintains a Bayesian posterior over an unspecifiable harm-cost function and, by a loss-asymmetry argument, treats halting or querying a human as the optimal action whenever tail risk is high; and (iii) a budgeted query policy that selects which questions are worth a human’s attention by maximizing a decision-theoretic value of information under a shadow price on interruptions. We give a residual-risk decomposition with three auditable design knobs and show how the architecture maps onto the First, Second, Third, and Zeroth Laws. We do not claim to prevent all harm; we claim a principled, verifiable reduction of it, and we are explicit about the failure modes that remain.

Algorithmic Guardrails for Human-Interacting Robots: A Shield-and-Deference Formalization of the Laws of Robotics

Key Points

Abstract

Cite This Study

Also Consider

Also Consider