What question did this study set out to answer?

The study aims to identify and characterize constraint drift in LLMs during long-horizon tasks.

June 26, 2026Open Access

Constraint Drift in Long-Horizon LLM Agents: How Language Models Forget What They Were Told to Do

Key Points

The study aims to identify and characterize constraint drift in LLMs during long-horizon tasks.
Conducted systematic empirical studies on three large language models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) across four task domains.
Developed a five-category taxonomy of drift and established a benchmark called DRIFT-Bench with 240 tasks.
Implemented three mitigation strategies: Periodic Constraint Injection, Hierarchical Constraint Anchoring, and Constraint-Aware Summarization.
Identified a monotonic increase in constraint violation rates with task horizon length, averaging a 3.4x increase from the first to the fourth task quartile.
Achieved up to 77% reduction in drift when the mitigation strategies were combined.

Abstract

Large language model (LLM) agents deployed on long-horizon tasks are routinely initialized with user-specified constraints. We identify and formally characterize a failure mode termed constraint drift — the progressive degradation of adherence to initial task constraints as agent execution extends across longer contexts and deeper sub-task chains. Through systematic empirical study across three frontier LLMs (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) on four task domains, we demonstrate that constraint violation rates increase monotonically with task horizon length, with a mean 3.4x increase between the first and fourth task quartile. We introduce a five-category taxonomy of drift, DRIFT-Bench (a 240-task benchmark), and three mitigation strategies — Periodic Constraint Injection (PCI), Hierarchical Constraint Anchoring (HCA), and Constraint-Aware Summarization (CAS) — achieving up to 77% drift reduction when combined.

Constraint Drift in Long-Horizon LLM Agents: How Language Models Forget What They Were Told to Do

Key Points

Abstract

Cite This Study

Also Consider

Also Consider