What question did this study set out to answer?

This work aims to clarify that degradation in long-running language agents is primarily due to context pollution rather than context length limitations.

May 26, 2026Open Access

Context Pollution in Long-Running Language Agents: Why Agent Degradation Is Not Primarily a Context Length Problem, but a Self-Generated Output Problem

Key Points

This work aims to clarify that degradation in long-running language agents is primarily due to context pollution rather than context length limitations.
Exploratory analysis of recursive AI-to-AI loop experiments.
Observation of various contexts and their effects on agent performance and output quality.
Identification of behaviors such as instruction drift and cognitive degradation.
Long-running agents experience cognitive degradation as they recycle their generated outputs, leading to loss of control and meaning.
Increasing context length exacerbates the issue by allowing more memory and pollution.
Context hygiene mechanisms are proposed to mitigate context pollution and enhance agent reliability.

Abstract

This deposit contains a preprint and supplementary experimental excerpts for the paper “Context Pollution in Long-Running Language Agents.” The central claim of this work is that the degradation of long-running language agents is not primarily caused by insufficient context length, but by self-generated context pollution. In recursive AI-to-AI loops, the model repeatedly consumes its own previous outputs. Over time, these self-generated texts may not merely introduce factual errors or semantic drift; they can lose their function as meaning-bearing and control-bearing signals. The surface form may remain grammatical and coherent, while the operational meaning becomes hollow. This process can produce repetition, instruction drift, ritualized behavior, inflated completion claims, meta-assistant leakage, role collapse, reduced responsiveness to new human input, and eventual cognitive degradation. In severe cases, endings no longer function as endings, completion claims no longer reliably indicate completion, and the agent begins to reproduce the form of task execution rather than performing the task itself. The paper argues that simply increasing context length is not sufficient. A larger context window increases both memory capacity and pollution capacity. Long-term agent reliability therefore requires context hygiene: mechanisms that classify, preserve, compress, isolate, or delete context items according to their origin, task relevance, AI-generation depth, signal value, and contamination risk. The supplementary excerpt document provides representative examples from exploratory AI-to-AI loop experiments, including: - a cosmology dialogue loop in which the model recognized repetition but returned to the same loop;- an IT research-agent loop where task execution degraded into ritualized reporting;- a story-continuation loop showing instruction drift, meta-assistant leakage, and loss of continuation control;- a self-observation experiment showing how compression can transmit polluted or distorted memory;- cross-experiment observations on AI-output maturation and decay across recursive generations. These materials are intended as an initial problem statement and observational dataset for studying context pollution, context hygiene, AITOAI contamination, and cognitive stability in long-horizon language agents. The work is exploratory and hypothesis-generating rather than a standardized benchmark study. Suggested citation: Maeda, Y. (2026). Context Pollution in Long-Running Language Agents: Why Agent Degradation Is Not Primarily a Context Length Problem, but a Self-Generated Output Problem. Zenodo.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Yusuke Maeda (Sun,) studied this question.

synapsesocial.com/papers/6a1539ccb5d9c58d83e8cdfb https://doi.org/https://doi.org/10.5281/zenodo.20360645

Bookmark

View Full Paper