AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning
By: Shihao Cai , Runnan Fang , Jialong Wu and more
Potential Business Impact:
Teaches AI to learn hard tasks in fake worlds.
Conducting reinforcement learning (RL) in simulated environments offers a cost-effective and highly scalable way to enhance language-based agents. However, previous work has been limited to semi-automated environment synthesis or tasks lacking sufficient difficulty, offering little breadth or depth. In addition, the instability of simulated users integrated into these environments, along with the heterogeneity across simulated environments, poses further challenges for agentic RL. In this work, we propose: (1) a unified pipeline for automated and scalable synthesis of simulated environments associated with high-difficulty but easily verifiable tasks; and (2) an environment level RL algorithm that not only effectively mitigates user instability but also performs advantage estimation at the environment level, thereby improving training efficiency and stability. Comprehensive evaluations on agentic benchmarks, including tau-bench, tau2-Bench, and VitaBench, validate the effectiveness of our proposed method. Further in-depth analyses underscore its out-of-domain generalization.
Similar Papers
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Artificial Intelligence
Lets AI learn to make smart choices.
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
Robotics
Tests robots better using videos and online help.
AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning
Computation and Language
Lets AI learn to pick the right tools.