LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
By: Yiming Wang , Da Yin , Yuedong Cui and more
Potential Business Impact:
Creates fake computer actions to train robots.
Digital agents require diverse, large-scale UI trajectories to generalize across real-world tasks, yet collecting such data is prohibitively expensive in both human annotation, infra and engineering perspectives. To this end, we introduce $\textbf{UI-Simulator}$, a scalable paradigm that generates structured UI states and transitions to synthesize training trajectories at scale. Our paradigm integrates a digital world simulator for diverse UI states, a guided rollout process for coherent exploration, and a trajectory wrapper that produces high-quality and diverse trajectories for agent training. We further propose $\textbf{UI-Simulator-Grow}$, a targeted scaling strategy that enables more rapid and data-efficient scaling by prioritizing high-impact tasks and synthesizes informative trajectory variants. Experiments on WebArena and AndroidWorld show that UI-Simulator rivals or surpasses open-source agents trained on real UIs with significantly better robustness, despite using weaker teacher models. Moreover, UI-Simulator-Grow matches the performance of Llama-3-70B-Instruct using only Llama-3-8B-Instruct as the base model, highlighting the potential of targeted synthesis scaling paradigm to continuously and efficiently enhance the digital agents.
Similar Papers
UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments
CV and Pattern Recognition
Lets AI learn to use phone apps from pictures.
Goal Alignment in LLM-Based User Simulators for Conversational AI
Computation and Language
Makes chatbots stick to their goals.
From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents
Software Engineering
Makes AI assistants understand screens faster and better.