Procedural Environment Generation for Tool-Use Agents
By: Michael Sullivan, Mareike Hartmann, Alexander Koller
Potential Business Impact:
Teaches AI to use tools better with fake practice.
Although the power of LLM tool-use agents has ignited a flurry of recent research in this area, the curation of tool-use training data remains an open problem$-$especially for online RL training. Existing approaches to synthetic tool-use data generation tend to be non-interactive, and/or non-compositional. We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data. We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks, and set the new SoTA for two metrics on the NESTFUL dataset. Further experiments show that downstream performance scales with the amount of RandomWorld-generated training data, opening up the possibility of further improvement through the use of entirely synthetic data.
Similar Papers
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Artificial Intelligence
Teaches computers to solve problems step-by-step.
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
Machine Learning (CS)
Teaches AI to use tools for new jobs.
Adaptive Tool Generation with Models as Tools and Reinforcement Learning
Computation and Language
Teaches AI to use tools without real-time internet.