This supplement studies a core limitation of observable-only, no-meta agents under exit-impossibility: the distinction between “good” and “bad” governance can be unidentifiable when mediator-side implementations are observationally equivalent from the agent’s local history (optionally including verification gates). We formalize governance unidentifiability as observational-equivalence classes over mediation mechanisms and define value-neutral robust progress as a minimax lower bound over a family of evaluation functionals, including necessity-evaluators that depend on necessity/viability trajectories rather than a single moral axis. Main results include an impossibility theorem: if the admissible model class contains an observationally unrefutable “floor-failure” regime for some necessity-evaluator, then no feasible policy can guarantee a strictly positive robust lower bound. We then package the exclusion of that impossibility premise as a non-arbitrariness requirement and connect it operationally to contestability, retaliation-resistant right-to-refuse (safe-default modes whose future feasibility cannot be silently destroyed), and control-domain independence (cross-domain witnesses) that break silent contract switching. The supplement also provides minimal structural counterexamples (shared roots, readout capture, retaliation) and delineates a publishable-vs-sensitive boundary to reduce dual-use risk.
Building similarity graph...
Analyzing shared references across papers
Loading...
K Takahashi
Building similarity graph...
Analyzing shared references across papers
Loading...
K Takahashi (Tue,) studied this question.
www.synapsesocial.com/papers/698435f0f1d9ada3c1fb566a — DOI: https://doi.org/10.5281/zenodo.18465306