This paper presents a toy simulation benchmark for the Sampling-Rate Hypothesis in runtime AI oversight. It evaluates whether monitoring cadence improves pre-commitment intervention only when proxy faithfulness, latency, and retained intervention feasibility remain adequate. The benchmark compares output-only monitoring, low-cadence monitoring, high-cadence monitoring with good proxy quality, high-cadence monitoring with degraded proxy quality, high-cadence monitoring with high latency, adaptive cadence monitoring, tail-risk-aware cadence monitoring, and high-cadence monitoring with self-interference. The results support the framework’s central timing claim under toy assumptions: higher sampling cadence can improve safe-stop success, but cadence alone is insufficient. Poor hazard discrimination can create unacceptable false-positive burden, high latency can erase the benefit of frequent sampling, and tail-risk-aware cadence provides the best safety-utility balance in this toy benchmark. The results should be interpreted as a simulation-based consistency check, not as real-world deployment validation.
Htet Ko Ko Naing (Tue,) studied this question.