Score: 3

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

Published: March 20, 2025 | arXiv ID: 2503.15937v3

By: Gaole Dai , Shiqi Jiang , Ting Cao and more

BigTech Affiliations: Microsoft

Potential Business Impact:

Makes phone apps work by themselves faster.

Business Areas:
Android Mobile, Platforms, Software

We propose V-Droid, a mobile GUI task automation agent. Unlike previous mobile agents that utilize Large Language Models (LLMs) as generators to directly generate actions at each step, V-Droid employs LLMs as verifiers to evaluate candidate actions before making final decisions. To realize this novel paradigm, we introduce a comprehensive framework for constructing verifier-driven mobile agents: the discretized action space construction coupled with the prefilling-only workflow to accelerate the verification process, the pair-wise progress preference training to significantly enhance the verifier's decision-making capabilities, and the scalable human-agent joint annotation scheme to efficiently collect the necessary data at scale. V-Droid obtains a substantial task success rate across several public mobile task automation benchmarks: 59.5% on AndroidWorld, 38.3% on AndroidLab, and 49% on MobileAgentBench, surpassing existing agents by 5.2%, 2.1%, and 9%, respectively. Furthermore, V-Droid achieves a remarkably low latency of 4.3s per step, which is 6.1X faster compared with existing mobile agents. The source code is available at https://github.com/V-Droid-Agent/V-Droid.

Country of Origin
πŸ‡­πŸ‡° πŸ‡ΊπŸ‡Έ πŸ‡ΈπŸ‡¬ πŸ‡¨πŸ‡³ Hong Kong, Singapore, China, United States

Repos / Data Links

Page Count
14 pages

Category
Computer Science:
Artificial Intelligence