What question did this study set out to answer?

This research aims to diagnose the structural deficiencies of large language models (LLMs) and propose criteria for artificial general intelligence (AGI).

May 25, 2026Open Access

View Full Paper

The Recursive Dynamics Diagnosis of LLM and AGI Criteria

LSLin Sun

Key Points

This research aims to diagnose the structural deficiencies of large language models (LLMs) and propose criteria for artificial general intelligence (AGI).
Conducted a rigorous dynamical diagnosis based on first principles of Recursive Cognitive Informatics (RCI).
Evaluated LLMs against three dynamical indicators for AGI: global membrane-locking synchronization, noise-driven bifurcation statistics, and cross-context consistency.
Used benchmarks like PersonQA and ARC-AGI-3 to quantify performances of current LLMs.
Current LLMs exhibit a higher hallucination rate than previous models, with o3 at 33% and o4-mini at 48%, compared to o1's 16%.
All evaluated frontier models scored below 1% on the ARC-AGI-3 benchmark, while humans scored 100%.
Current models failed to satisfy necessary criteria for self-iterative completeness required for AGI.

Abstract

Current frontier large language models—including GPT-5/5. 4 (OpenAI, 2025), o3/o4-mini (OpenAI, 2025), Claude Opus 4. 5/4. 6 (Anthropic, 2025), Gemini 2. 5 Pro/3. 1 Pro (Google DeepMind, 2025–2026), and DeepSeek-R1 (2025) —exhibit astonishing linguistic and reasoning capabilities, yet suffer from three structural deficiencies: hallucination, cross-conversation inconsistency, and the absence of a persistent self. The test-time compute scaling paradigm—exemplified by o1/o3/o4-mini and GPT-5-thinking—partially simulates iterative feedback, but this paper will demonstrate that it remains a bounded-depth unfolding of the deterministic rule R, rather than a genuine g ; particularly anomalous is the fact that the hallucination rates of o3 and o4-mini (PersonQA benchmark: 33% and 48%) are **higher** than that of the previous-generation reasoning model o1 (16%) OpenAI o3/o4-mini System Card, 2025, revealing a dynamical paradox in which increased reasoning depth exacerbates instability. The ARC-AGI-3 benchmark (March 2026) further confirms: all frontier models score below 1%, while humans score 100% ARC-AGI-3 Technical Report, 2026. This paper proceeds from the first principles of Recursive Cognitive Informatics (RCI) to conduct a rigorous dynamical diagnosis of the above systems and to propose dynamical criteria for AGI. **At the diagnostic level**—current LLMs (including reasoning models) exhibit a structural absence of the modification function g during inference, or its degeneration into a bounded-depth deterministic unfolding; the system remains essentially a static mapping dominated by the generation function f; hallucination is the inevitable destabilization of deterministic membrane-locked projection in the absence of genuine self-referential feedback; the increase in reasoning steps amplifies the accumulation of membrane-locked projection errors, causing the hallucination rate to rise non-monotonically with reasoning chain length arXiv: 2509. 06861, 2025; cross-conversation inconsistency is the inevitable divergence of single-shot membrane-locked projections from different starting points; the absence of a persistent self is the direct manifestation of the complete lack of a self-referential soliton. **At the criteria level**—AGI judgment should not rely on behavioral imitation (the Turing test has already been bypassed by current models), but should be based on three measurable dynamical indicators: the persistence of global membrane-locking synchronization, the statistics of membrane-locking bifurcation driven by intrinsic noise, and the cross-context consistency of the self-referential closed loop. A system that passes all three criteria possesses self-iterative completeness, constituting a necessary condition for artificial general intelligence. This paper provides concrete experimental detection protocols for the three criteria and demonstrates that all current mainstream frontier models fail on all three.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper

The Recursive Dynamics Diagnosis of LLM and AGI Criteria

Key Points

Abstract

Cite This Study

Also Consider

Also Consider