WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis
By: Yifei Gao , Junhong Ye , Jiaqi Wang and more
Potential Business Impact:
Creates fake websites to train smarter web robots.
Recent advancements in large language models (LLMs) have significantly improved the capabilities of web agents. However, effectively navigating complex and dynamic web environments still requires more advanced trajectory-level planning and execution. Prior studies have addressed self-improving agents by collecting extensive GUI trajectories from real-environment interactions. Despite their effectiveness, these approaches encounter two critical challenges: (1) Uncontrollable environment states, where real or sandboxed web environments often yield unstable and non-deterministic feedback, complicating the reproduction and debugging of agent behaviors; and (2) High API costs, as generating even a single interaction trajectory can involve hundreds of queries, leading to considerable API usage and computational expenses. To address these limitations and enable scalable self-improvement for agents, we propose WebSynthesis, a novel framework for trajectory synthesis and training. WebSynthesis leverages a learned world model to simulate virtual web environments, allowing a policy agent to perform efficient and reversible tree-based planning. This approach supports the large-scale generation of diverse and high-quality trajectories, which are subsequently utilized to refine the agent's policy. Experimental results demonstrate that an agent trained using WebSynthesis on a small-scale synthetic dataset achieves performance comparable to or even surpassing that of models trained on large-scale real-world data.
Similar Papers
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Computation and Language
Helps AI learn new tasks by trying out plans.
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Artificial Intelligence
Teaches robots to use the internet better.
Adapting Web Agents with Synthetic Supervision
Machine Learning (CS)
Teaches robots to use any website perfectly.