What question did this study set out to answer?

The aim is to address the Rerun Crisis in LLM-based web agents by minimizing inference costs during automated workflows.

April 8, 2026Open Access

Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized Inference Cost Web Automation

Key Points

The aim is to address the Rerun Crisis in LLM-based web agents by minimizing inference costs during automated workflows.
Characterization of the Rerun Crisis and its effects on inference costs.
Development of a Compile-and-Execute architecture to optimize task execution.
Implementation of a DOM Sanitization Module to produce semantic representations.
Empirical evaluation across three enterprise task types: data extraction, form filling, and fingerprinting.
Reduction of per-workflow inference cost from approximately $150.00 to as low as $0.10.
Compilation success rates ranged from 80% to 94%.
Execution accuracies achieved between 95% and 98% across various tasks.
Per-compilation inference costs varied between $0.002 and $0.092.

Abstract

LLM-driven web agents that operate through continuous inference loops—repeatedly querying a large language model (LLM) to evaluate browser state and select the next action—exhibit a fundamental scalability constraint when applied to repetitive, high-volume tasks. This paper characterizes that constraint as the Rerun Crisis: the linear growth of token expenditure and API latency with respect to both task length and execution frequency. For a representative five-step data extraction workflow executed across 500 iterations, an unoptimized continuous agent incurs approximately 150. 00 in inference costs; even with aggressive state-caching, this figure remains near 15. 00. We propose a Compile-and-Execute architecture that decouples LLM reasoning from browser execution, reducing per-workflow inference cost to as low as 0. 10 depending on model selection. A one-shot LLM invocation processes a token-efficient semantic representation of the target page—produced by a DOM Sanitization Module (DSM) —and emits a deterministic JSON workflow blueprint. A lightweight deterministic runtime then drives the browser without further model queries. We formalize the cost asymmetry as a reduction from O (M N) to amortized O (1) inference scaling, where M is the number of reruns and N is the number of sequential actions per run. Empirical evaluation across three enterprise task modalities—high-volume data extraction, dynamic form filling, and technology-stack fingerprinting—yields compilation success rates of 80–94% and execution accuracies of 95–98%, at per-compilation inference costs between 0. 002 and 0. 092 across five frontier models. These results establish deterministic compilation as a technically sound paradigm for high-throughput web automation, enabling economically viable automation at scales previously infeasible under continuous inference architectures.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper