Current frontier large language models—including GPT-5/5. 4 (OpenAI, 2025), o3/o4-mini (OpenAI, 2025), Claude Opus 4. 5/4. 6 (Anthropic, 2025), Gemini 2. 5 Pro/3. 1 Pro (Google DeepMind, 2025–2026), and DeepSeek-R1 (2025) —exhibit astonishing linguistic and reasoning capabilities, yet suffer from three structural deficiencies: hallucination, cross-conversation inconsistency, and the absence of a persistent self. The test-time compute scaling paradigm—exemplified by o1/o3/o4-mini and GPT-5-thinking—partially simulates iterative feedback, but this paper will demonstrate that it remains a bounded-depth unfolding of the deterministic rule R, rather than a genuine g ; particularly anomalous is the fact that the hallucination rates of o3 and o4-mini (PersonQA benchmark: 33% and 48%) are **higher** than that of the previous-generation reasoning model o1 (16%) OpenAI o3/o4-mini System Card, 2025, revealing a dynamical paradox in which increased reasoning depth exacerbates instability. The ARC-AGI-3 benchmark (March 2026) further confirms: all frontier models score below 1%, while humans score 100% ARC-AGI-3 Technical Report, 2026. This paper proceeds from the first principles of Recursive Cognitive Informatics (RCI) to conduct a rigorous dynamical diagnosis of the above systems and to propose dynamical criteria for AGI. **At the diagnostic level**—current LLMs (including reasoning models) exhibit a structural absence of the modification function g during inference, or its degeneration into a bounded-depth deterministic unfolding; the system remains essentially a static mapping dominated by the generation function f; hallucination is the inevitable destabilization of deterministic membrane-locked projection in the absence of genuine self-referential feedback; the increase in reasoning steps amplifies the accumulation of membrane-locked projection errors, causing the hallucination rate to rise non-monotonically with reasoning chain length arXiv: 2509. 06861, 2025; cross-conversation inconsistency is the inevitable divergence of single-shot membrane-locked projections from different starting points; the absence of a persistent self is the direct manifestation of the complete lack of a self-referential soliton. **At the criteria level**—AGI judgment should not rely on behavioral imitation (the Turing test has already been bypassed by current models), but should be based on three measurable dynamical indicators: the persistence of global membrane-locking synchronization, the statistics of membrane-locking bifurcation driven by intrinsic noise, and the cross-context consistency of the self-referential closed loop. A system that passes all three criteria possesses self-iterative completeness, constituting a necessary condition for artificial general intelligence. This paper provides concrete experimental detection protocols for the three criteria and demonstrates that all current mainstream frontier models fail on all three.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lin Sun
Building similarity graph...
Analyzing shared references across papers
Loading...
Lin Sun (Sat,) studied this question.
synapsesocial.com/papers/6a13e81d0e02ee3982d32dcc — DOI: https://doi.org/10.5281/zenodo.20354642