What question did this study set out to answer?

The research aims to analyze the behavior of execution trajectories in multi-step workflows of large language models (LLMs).

May 21, 2026Open Access

Trajectory Drift and Execution Validity in Multi-Step LLM Workflows

Key Points

The research aims to analyze the behavior of execution trajectories in multi-step workflows of large language models (LLMs).
Introduced a deterministic framework using replayable lexical and structural signals.
Separated behaviors such as continuation, drift, branching, and convergence using trajectory-relative measurements.
Analyzed replayable traces captured from OpenAI and Anthropic models.
Observed a local-versus-global mismatch where local continuity remains high while structural divergence increases.
Identified distinct transition behaviors and divergence characteristics in execution families.
Results imply that effective multi-step execution demands analysis of execution-state evolution over time.

Abstract

Large language model (LLM) systems increasingly operate through iterative multi-step execution involving retries, branching, refinement, orchestration, convergence, and continuation behaviors. Existing runtime instrumentation systems primarily expose request-level telemetry such as latency, token consumption, execution traces, and workflow events, but provide limited visibility into how execution trajectories evolve over time.This paper introduces a deterministic framework for analyzing execution trajectory behavior in multi-step LLM workflows using replayable lexical and structural signals. The analysis separates continuation, drift, branching, and convergence execution behaviors using deterministic trajectory-relative measurements. Rather than evaluating semantic correctness or reasoning quality, the framework analyzes structural persistence relative to originating execution conditions.Across a controlled cross-provider corpus of replayable traces captured from OpenAI and Anthropic models, we observe a repeatable local-versus-global mismatch phenomenon: local continuity between adjacent execution steps can remain high while persistence to the originating trajectory progressively weakens. This creates measurable regimes in which execution appears locally coherent despite structural divergence over longer execution horizons.The paper further introduces deterministic runtime diagnostics including drift velocity, transition stability, branch divergence, and branch convergence using replayable structural primitives only. Results show that execution families exhibit distinguishable transition behaviors and divergence characteristics across multi-step workflows.The findings suggest that continuation in iterative LLM systems has runtime implications beyond token efficiency alone. Multi-step execution increasingly requires analysis of execution-state evolution across extended continuation horizons rather than request-level telemetry in isolation.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper