In order to maximise the net benefit of AGI (Artificial General Intelligence, and, in particular, agentic superintelligent AGI) for all humanity, without favouring any subset thereof, we imagine a Gold-Standard AGI that is maximally-aligned and maximally-validated. The first of these properties --- alignment --- is traditionally decomposed into outer alignment (how do we define a final goal FGG that correctly states what we want? ), and inner alignment (how do we build an agent G that forever pursues FGG as intended? ) This paper presents a complete, foundational, and self-contained theory of AGI, culminating in an implementation-neutral solution to the outer AGI alignment problem in the case that G is superintelligent (hence "superalignment"). Given the AGI alignment problem's profound relevance to AGI governance, we adopt a pedagogic style throughout, in order that the paper might be accessible to less technical readers such as AGI policymakers.
Aaron Turner (Mon,) studied this question.