This paper presents a toy simulation benchmark for the Sampling-Rate Hypothesis in runtime AI oversight. It evaluates whether monitoring cadence improves pre-commitment intervention only when proxy faithfulness, latency, and retained intervention feasibility remain adequate. The benchmark compares output-only monitoring, low-cadence monitoring, high-cadence monitoring with good proxy quality, high-cadence monitoring with degraded proxy quality, high-cadence monitoring with high latency, adaptive cadence monitoring, tail-risk-aware cadence monitoring, and high-cadence monitoring with self-interference. The results support the framework’s central timing claim under toy assumptions: higher sampling cadence can improve safe-stop success, but cadence alone is insufficient. Poor hazard discrimination can create unacceptable false-positive burden, high latency can erase the benefit of frequent sampling, and tail-risk-aware cadence provides the best safety-utility balance in this toy benchmark. The results should be interpreted as a simulation-based consistency check, not as real-world deployment validation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Htet Ko Ko Naing
Building similarity graph...
Analyzing shared references across papers
Loading...
Htet Ko Ko Naing (Tue,) studied this question.
synapsesocial.com/papers/69f2a47b8c0f03fd677638a4 — DOI: https://doi.org/10.5281/zenodo.19846009