The rapid proliferation of large language models (LLMs) has accelerated the production of online content, yet this growth introduces a recursive dynamic: as models increasingly train on AI-generated text, the diversity and fidelity of outputs decline. Recent research identifies this phenomenon as model collapse, where recursive self-training erodes the statistical tails of language distributions, yielding homogenized and repetitive responses. Parallel studies of recommender systems highlight analogous feedback loops, including algorithmic monocultures and popularity bias, that converge toward uniformity and suppress minority content. Combined, these forces risk what may be termed a dull stable state: an equilibrium in which digital knowledge systems recycle their own outputs, losing novelty, accuracy, and epistemic richness. This state carries profound security implications. Homogenized training corpora amplify vulnerabilities to data poisoning, adversarial prompt injection, and covert manipulation, while reliance on synthetic data increases risks of privacy leakage through membership inference and undermines trust in provenance. Addressing these challenges requires robust data curation, provenance standards (e.g., watermarking, C2PA credentials), and hybrid training strategies that maintain human-authored input. The literature suggests that without such interventions, the pursuit of scale may trade diversity for stability, locking the AI ecosystem into a fragile equilibrium with degraded utility and heightened security exposure.
Mauricio Lozano (Mon,) studied this question.