The way in which AI (and, in particular, agentic superintelligent AGI) develops over the coming decades will determine the fate of all humanity for all eternity. In order to maximise the net benefit of AGI for all humanity, without favouring any subset thereof, we imagine a Gold-Standard AGI that is maximally-aligned and maximally-validated. The first of these properties --- alignment --- is traditionally decomposed into outer alignment (how do we define a final goal FGG that correctly states what we want? ), and inner alignment (how do we build an agent G that forever pursues FGG as intended? ) This paper presents a complete and foundational theory of AGI, culminating in a proposed implementation-neutral solution to the outer AGI alignment problem in the case that G is superintelligent (hence "superalignment"). Given the AGI alignment problem's profound relevance to AGI governance, we adopt a pedagogic style throughout in order that the paper might be accessible to less technical readers such as AGI policymakers. We envisage that the definitions of practical-maximal-alignment and practical-maximal-validation presented in this paper could form the basis of an international standard for Gold-Standard AGI certification, and that this international standard could form the basis for the formal certification of AGI by competent certification authorities, such that only formally-certified Gold-Standard AGI systems could then be lawfully deployed within the jurisdiction of each certification authority.
Building similarity graph...
Analyzing shared references across papers
Loading...
Aaron Turner
Laboratori Guglielmo Marconi (Italy)
Building similarity graph...
Analyzing shared references across papers
Loading...
Aaron Turner (Thu,) studied this question.
www.synapsesocial.com/papers/69fecf49b9154b0b828764ca — DOI: https://doi.org/10.5281/zenodo.20071239