Large language model (LLM) agents deployed on long-horizon tasks are routinely initialized with user-specified constraints. We identify and formally characterize a failure mode termed constraint drift — the progressive degradation of adherence to initial task constraints as agent execution extends across longer contexts and deeper sub-task chains. Through systematic empirical study across three frontier LLMs (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) on four task domains, we demonstrate that constraint violation rates increase monotonically with task horizon length, with a mean 3.4x increase between the first and fourth task quartile. We introduce a five-category taxonomy of drift, DRIFT-Bench (a 240-task benchmark), and three mitigation strategies — Periodic Constraint Injection (PCI), Hierarchical Constraint Anchoring (HCA), and Constraint-Aware Summarization (CAS) — achieving up to 77% drift reduction when combined.
Diwakar S (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: