This paper presents a formal model of "semantic drift," a process of cumulative error propagation in recursive symbolic systems. I propose that communicative degradation arises not from vocabulary size alone, but from the recursive reinterpretation of information under cognitive or computational constraints. Using an information-theoretic framework, the paper introduces a novel formula that models how error probability grows non-linearly as a function of recursion depth, vocabulary size, polysemy, and system maturity. I argue that this model of semantic drift provides a unifying theoretical bridge to the well-documented phenomenon of "model collapse" in artificial intelligence, where large language models (LLMs) degrade when recursively trained on their own synthetic output. The paper extends this model by proposing that the predicted semantic drift can be empirically quantified as a form of geometric representational collapse, measurable by metrics such as Intrinsic Dimensionality (ID) and Effective Rank (ERank). The analysis is supported by a literature review connecting information theory, iterated learning, and contemporary AI research on model collapse and neural collapse. Finally, I derive a set of practical, testable guidelines for practitioners to mitigate the risk of collapse in applied AI systems, including fine-tuning loops, agentic workflows, and low-resource language deployments. This work offers a mechanistic explanation for observed performance gaps in low-resource settings and provides a quantitative framework for designing more resilient and reliable artificial information systems.
Jeremy Weestrand (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: