Score: 3

R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Published: April 9, 2025 | arXiv ID: 2504.07164v1

By: Naman Jain , Jaskirat Singh , Manish Shetty and more

BigTech Affiliations: University of California, Berkeley

Potential Business Impact:

Helps computers fix coding problems automatically.

Business Areas:

Simulation Software

Improving open-source models on real-world SWE tasks (solving GITHUB issues) faces two key challenges: 1) scalable curation of execution environments to train these models, and, 2) optimal scaling of test-time compute. We introduce AgentGym, the largest procedurally-curated executable gym environment for training real-world SWE-agents, consisting of more than 8.7K tasks. AgentGym is powered by two main contributions: 1) SYNGEN: a synthetic data curation recipe that enables scalable curation of executable environments using test-generation and back-translation directly from commits, thereby reducing reliance on human-written issues or unit tests. We show that this enables more scalable training leading to pass@1 performance of 34.4% on SWE-Bench Verified benchmark with our 32B model. 2) Hybrid Test-time Scaling: we provide an in-depth analysis of two test-time scaling axes; execution-based and execution-free verifiers, demonstrating that they exhibit complementary strengths and limitations. Test-based verifiers suffer from low distinguishability, while execution-free verifiers are biased and often rely on stylistic features. Surprisingly, we find that while each approach individually saturates around 42-43%, significantly higher gains can be obtained by leveraging their complementary strengths. Overall, our approach achieves 51% on the SWE-Bench Verified benchmark, reflecting a new state-of-the-art for open-weight SWE-agents and for the first time showing competitive performance with proprietary models such as o1, o1-preview and sonnet-3.5-v2 (with tools). We will open-source our environments, models, and agent trajectories.

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

Machine Learning (CS)

Teaches computers to use websites like people.

5 Jan 2026 0

88%

Training Versatile Coding Agents in Synthetic Environments

Software Engineering

Teaches computers to code and fix bugs.

13 Dec 2025 2

88%

SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories

Software Engineering

Helps computers learn to fix software problems.

10 Sep 2025 3

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇦🇺 United States, Australia

Page Count

27 pages

R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Helps computers fix coding problems automatically.

Technical Abstract

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

Training Versatile Coding Agents in Synthetic Environments

SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories