Graphical User Interface (GUI) testing has historically struggled with the “semantic gap” between human understanding and machine execution. Large Language Models (LLMs) are now bridging this gap by enabling a transition from automating repetitive actions to automating cognitive processes. This article presents a process-centric review of 55 seminal studies published between January 2023 and July 2025 to systematize this rapid evolution. Unlike existing surveys that focus on isolated architectural elements, we analyze the integration of LLM agents across the entire testing lifecycle, from test design and scripting to execution, oracle verification, and maintenance. Our analysis reveals three key findings: (1) Architecture: Effective agents have converged on a “spatial-semantic” perception model, combining visual screenshots with Document Object Model (DOM) structures to ground high-level intent into precise actions. (2) Lifecycle Impact: The paradigm is shifting from rigid, specification-based script generation to autonomous, intent-driven exploration and self-healing maintenance via abstraction-concretization mechanisms. (3) Evaluation: While current benchmarks effectively measure task completion, a disconnect remains between academic prototypes and industrial requirements regarding reliability, cost, and latency. The article concludes by identifying critical gaps in business process testing and outlining a research roadmap to advance LLM-based testing from experimental prototypes to robust, enterprise-grade quality assurance solutions.
Trong et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: