What question did this study set out to answer?

This research investigates the feasibility of allowing self-modification in systems while controlling risk levels.

synapse

⌘+K

synapse

⌘+K

March 28, 2026Open Access

Information-Theoretic Limits of Safety Verification for Self-Improving Systems

Key Points

This research investigates the feasibility of allowing self-modification in systems while controlling risk levels.
Formalizing criteria for bounded risk and unbounded utility
Proving classification impossibility using Hölder's inequality
Deriving tighter bounds through NP counting methods
Validating theories using empirical data from GPT-2
Proven impossibility of achieving both bounded risk and unbounded utility with conventional classifiers
Hölder's inequality shows maximum classifier utility grows slower than linear growth
A Lipschitz ball verifier can maintain zero risk while allowing positive utility

Abstract

Can a safety gate permit unbounded beneficial self-modification while maintaining bounded cumulative risk? We formalize this question through dual conditions — requiring Σδₙ 1, any classifier-based gate under overlapping safe/unsafe distributions forces ΣTPRₙ 0 (Theorem 2), validated on GPT-2 (dLoRA = 147, 456). Comprehensive empirical validation is in the companion paper.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper