Score: 1

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Published: August 7, 2025 | arXiv ID: 2508.05731v1

By: Yuhang Liu , Zeyu Liu , Shuanghe Zhu and more

Potential Business Impact:

Helps computers understand screen buttons from pictures.

The emergence of Multimodal Large Language Models (MLLMs) has propelled the development of autonomous agents that operate on Graphical User Interfaces (GUIs) using pure visual input. A fundamental challenge is robustly grounding natural language instructions. This requires a precise spatial alignment, which accurately locates the coordinates of each element, and, more critically, a correct semantic alignment, which matches the instructions to the functionally appropriate UI element. Although Reinforcement Learning with Verifiable Rewards (RLVR) has proven to be effective at improving spatial alignment for these MLLMs, we find that inefficient exploration bottlenecks semantic alignment, which prevent models from learning difficult semantic associations. To address this exploration problem, we present Adaptive Exploration Policy Optimization (AEPO), a new policy optimization framework. AEPO employs a multi-answer generation strategy to enforce broader exploration, which is then guided by a theoretically grounded Adaptive Exploration Reward (AER) function derived from first principles of efficiency eta=U/C. Our AEPO-trained models, InfiGUI-G1-3B and InfiGUI-G1-7B, establish new state-of-the-art results across multiple challenging GUI grounding benchmarks, achieving significant relative improvements of up to 9.0% against the naive RLVR baseline on benchmarks designed to test generalization and semantic understanding. Resources are available at https://github.com/InfiXAI/InfiGUI-G1.

Graph-Enhanced Policy Optimization in LLM Agent Training

Artificial Intelligence

Teaches AI to learn better by seeing connections.

30 Oct 2025 1

89%

Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

Artificial Intelligence

Helps computers understand and click on screen buttons.

18 May 2025 2

89%

GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning

Artificial Intelligence

Teaches computers to understand screens with less training.

6 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇭🇰 🇨🇳 Hong Kong, China

Page Count

11 pages

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Helps computers understand screen buttons from pictures.

Technical Abstract

Graph-Enhanced Policy Optimization in LLM Agent Training

Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning