What question did this study set out to answer?

This research aims to develop a comprehensive theory addressing the outer alignment problem of superintelligent AGI.

June 3, 2026Open Access

Gold-Standard AGI: Outer AGI Superalignment

Key Points

This research aims to develop a comprehensive theory addressing the outer alignment problem of superintelligent AGI.
Develop foundational concepts for AGI alignment.
Decompose alignment into outer and inner components.
Present a self-contained theory to ensure broad accessibility.
Proposes a theoretically sound and accessible solution for outer AGI alignment.
Defines key concepts necessary for understanding AGI alignment.
Empowers AGI policymakers with foundational knowledge for governance.

Abstract

In order to maximise the net benefit of AGI (Artificial General Intelligence, and, in particular, agentic superintelligent AGI) for all humanity, without favouring any subset thereof, we imagine a Gold-Standard AGI that is maximally-aligned and maximally-validated. The first of these properties --- alignment --- is traditionally decomposed into outer alignment (how do we define a final goal FGG that correctly states what we want? ), and inner alignment (how do we build an agent G that forever pursues FGG as intended? ) This paper presents a complete, foundational, and self-contained theory of AGI, culminating in an implementation-neutral solution to the outer AGI alignment problem in the case that G is superintelligent (hence "superalignment"). Given the AGI alignment problem's profound relevance to AGI governance, we adopt a pedagogic style throughout, in order that the paper might be accessible to less technical readers such as AGI policymakers.

Gold-Standard AGI: Outer AGI Superalignment

Key Points

Abstract

Cite This Study