What question did this study set out to answer?

This research aims to model semantic drift and error propagation in recursive symbolic systems within language.

March 26, 2026Open Access

Recursive Complexity and Semantic Drift: A Model of Error Propagation in Evolving Language Systems

Key Points

This research aims to model semantic drift and error propagation in recursive symbolic systems within language.
Developed a formal model of semantic drift using an information-theoretic framework.
Introduced a novel formula relating error probability to recursion depth and other factors.
Reviewed literature on information theory, iterated learning, and AI model collapse to support findings.
Proposed practical guidelines for mitigating collapse in applied AI systems.
Error probability increases non-linearly with recursion depth and vocabulary size.
Found connections between semantic drift and model collapse in AI systems.
Identified metrics such as intrinsic dimensionality and effective rank to quantify semantic drift.

Abstract

This paper presents a formal model of "semantic drift," a process of cumulative error propagation in recursive symbolic systems. I propose that communicative degradation arises not from vocabulary size alone, but from the recursive reinterpretation of information under cognitive or computational constraints. Using an information-theoretic framework, the paper introduces a novel formula that models how error probability grows non-linearly as a function of recursion depth, vocabulary size, polysemy, and system maturity. I argue that this model of semantic drift provides a unifying theoretical bridge to the well-documented phenomenon of "model collapse" in artificial intelligence, where large language models (LLMs) degrade when recursively trained on their own synthetic output. The paper extends this model by proposing that the predicted semantic drift can be empirically quantified as a form of geometric representational collapse, measurable by metrics such as Intrinsic Dimensionality (ID) and Effective Rank (ERank). The analysis is supported by a literature review connecting information theory, iterated learning, and contemporary AI research on model collapse and neural collapse. Finally, I derive a set of practical, testable guidelines for practitioners to mitigate the risk of collapse in applied AI systems, including fine-tuning loops, agentic workflows, and low-resource language deployments. This work offers a mechanistic explanation for observed performance gaps in low-resource settings and provides a quantitative framework for designing more resilient and reliable artificial information systems.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper