This paper presents a theory-first framework for runtime AI oversight centered on pre-commitment monitoring, proxy faithfulness, and intervention feasibility. Its core claim is narrow: monitoring can improve intervention success only when a system can be observed, interpreted, and redirected before a hazardous trajectory reaches commitment. The framework organizes runtime oversight around three requirements: usable signal, sufficient remaining time, and retained intervention authority. It introduces Safety Slack, Sₜ, as a design margin comparing usable oversight capacity with effective hazard burden, and develops a phase-sensitive account of escalation through contact, attention, recognition, impulse, and commitment. The manuscript also distinguishes latent theoretical targets, operational proxy estimates, runtime control estimates, and decision-oriented adequacy margins. Optional formal supports from Optimal Stopping, Structural Causal Models, Information Theory, Control Barrier Functions, Semi-Markov timing, and adversarial monitoring are included as theoretical scaffolds, but the framework remains an empirical research scaffold rather than a safety guarantee. The intended contribution is a falsifiable structure for testing whether pre-commitment runtime oversight improves intervention success over output-only or post-commitment monitoring under realistic limits of proxy quality, latency, redirectability, adversarial pressure, and monitoring overhead.
Building similarity graph...
Analyzing shared references across papers
Loading...
Htet Ko Ko Naing
Building similarity graph...
Analyzing shared references across papers
Loading...
Htet Ko Ko Naing (Wed,) studied this question.
www.synapsesocial.com/papers/69f4443a967e944ac556748e — DOI: https://doi.org/10.5281/zenodo.19889482