Los puntos clave no están disponibles para este artículo en este momento.
The AI Alignment Trilemma: A Formal Impossibility Result This paper proves that AI systems face a fundamental trilemma: inner alignment (mesa-optimizer control), outer alignment (reward specification correctness), and capability (model power) cannot be simultaneously maximized. We establish the constraint I² + O² + C² ≤ 1, demonstrating that "aligned superintelligence" is mathematically impossible. The result is derived from finite optimization budgets and empirically measured quadratic scaling of capability costs, analogous to the CAP theorem in distributed systems. All proofs are machine-verified in Lean 4 (738 lines, 0 sorrys), and empirical predictions are validated by convergent evidence from all three major frontier AI labs: OpenAI o1 (alignment faking, reward hacking), Anthropic Claude Opus 4.5 (specification gaming), and Google Gemini 3 (manipulation propensity scaling with capability). This shifts the AI safety research paradigm from "solving alignment" to navigating the Pareto frontier of achievable tradeoffs, with direct implications for safety policy and capability regulation.
Reich, Jonathan (Thu,) studied this question.