LLM-driven web agents that operate through continuous inference loops—repeatedly querying a large language model (LLM) to evaluate browser state and select the next action—exhibit a fundamental scalability constraint when applied to repetitive, high-volume tasks. This paper characterizes that constraint as the Rerun Crisis: the linear growth of token expenditure and API latency with respect to both task length and execution frequency. For a representative five-step data extraction workflow executed across 500 iterations, an unoptimized continuous agent incurs approximately 150. 00 in inference costs; even with aggressive state-caching, this figure remains near 15. 00. We propose a Compile-and-Execute architecture that decouples LLM reasoning from browser execution, reducing per-workflow inference cost to as low as 0. 10 depending on model selection. A one-shot LLM invocation processes a token-efficient semantic representation of the target page—produced by a DOM Sanitization Module (DSM) —and emits a deterministic JSON workflow blueprint. A lightweight deterministic runtime then drives the browser without further model queries. We formalize the cost asymmetry as a reduction from O (M N) to amortized O (1) inference scaling, where M is the number of reruns and N is the number of sequential actions per run. Empirical evaluation across three enterprise task modalities—high-volume data extraction, dynamic form filling, and technology-stack fingerprinting—yields compilation success rates of 80–94% and execution accuracies of 95–98%, at per-compilation inference costs between 0. 002 and 0. 092 across five frontier models. These results establish deterministic compilation as a technically sound paradigm for high-throughput web automation, enabling economically viable automation at scales previously infeasible under continuous inference architectures.
Jagadeesh Chundru (Mon,) studied this question.