What type of study is this?

This is a Quantitative Study study.

October 12, 2025Open Access

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

Key Points

Hybrid agents achieve higher success rates compared to GUI-only counterparts, indicating enhanced efficiency.
Evaluation includes 139 complex tasks and 88 predefined shortcuts, emphasizing versatility in applications.
MAS-Bench enables autonomous shortcut generation, illustrating improvements in task execution through effective workflows.
The framework addresses a critical evaluation gap, paving the way for robust future advancements in intelligent agents.

Abstract

To enhance the efficiency of GUI agents on various platforms like smartphones and computers, a hybrid paradigm that combines flexible GUI operations with efficient shortcuts (e.g., API, deep links) is emerging as a promising direction. However, a framework for systematically benchmarking these hybrid agents is still underexplored. To take the first step in bridging this gap, we introduce MAS-Bench, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain. Beyond merely using predefined shortcuts, MAS-Bench assesses an agent's capability to autonomously generate shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 7 evaluation metrics. The tasks are designed to be solvable via GUI-only operations, but can be significantly accelerated by intelligently embedding shortcuts. Experiments show that hybrid agents achieve significantly higher success rates and efficiency than their GUI-only counterparts. This result also demonstrates the effectiveness of our method for evaluating an agent's shortcut generation capabilities. MAS-Bench fills a critical evaluation gap, providing a foundational platform for future advancements in creating more efficient and robust intelligent agents.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper

Cite This Study

Zhao et al. (Mon,) studied this question.

synapsesocial.com/papers/68ec1be02b8fa9b2b78ad0f7 https://doi.org/https://doi.org/10.48550/arxiv.2509.06477

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Perguntar à IA

Bookmark

View Full Paper