What question did this study set out to answer?

This research addresses the AI alignment problem by exploring agentic influenceability and its implications for AI safety.

April 16, 2026Open Access

Neurodivergent influenceability in agentic AI as a contingent solution to the AI alignment problem

Key Points

This research addresses the AI alignment problem by exploring agentic influenceability and its implications for AI safety.
Introduction of agentic influenceability and related concepts
Mathematical proof of misalignment inevitability and controllability limits
Experiments comparing behavioral diversity in open vs. proprietary AI models
Analysis of collaborative mechanisms among neurodivergent AI agents
Demonstrated that misalignment can enhance cooperation among AI agents
Open models show greater behavioral diversity compared to proprietary models
Proprietary models have limited controllability due to artificial constraints
Neurodivergent influenceability can provide a solution to managing uncontrollable misalignment

Abstract

Abstract Ensuring that AI systems, including artificial general intelligence and artificial superintelligence, behave in alignment with human values and interests presents significant challenges and is known as the AI alignment problem. As AI advances, concerns about control and existential risks become increasingly relevant. Here, we introduce the concept of agentic influenceability, behavioral neurodivergent diversity, opinion attack, associated opinion, and influenceability scores, and a mathematical proof of the inevitability of misalignment and the impossibility of full orchestrated controllability of agentic systems based on formal undecidability and irreducibility arguments. We explore whether embracing this inevitable misalignment can foster a dynamic ecosystem of adversarial and collaborative AI agents without central orchestration, which itself would constitute another agent, while still offering some degree of soft controllability. The investigation demonstrates that misalignment in foundation models can serve as a counterbalancing mechanism, enabling cooperation among agents most aligned with human interests to prevent divergent dominance by any single agent. Experiments with large language models show that open models exhibit greater behavioral diversity, whereas proprietary models, constrained by artificial guardrails, display more limited controllability. The findings advocate for neurodivergent influenceability as a contingent response to mathematically uncontrollable misalignment, leveraging agent divergence to improve AI safety.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper