Agentic Reinforcement Learning for Real-World Code Repair
By: Siyu Zhu , Anastasiya Karpovich , Albert Chen and more
Potential Business Impact:
Fixes computer code automatically and reliably.
We tackle the challenge of training reliable code-fixing agents in real repositories, where complex builds and shifting dependencies make evaluation unstable. We developed a verifiable pipeline with success defined as post-fix build validation and improved reproducibility across ~1K real issues by pinning dependencies and disabling automatic upgrades. Building on this, we introduced a scalable simplified pipeline for large-scale reinforcement learning (RL). Using this setup, we supervised fine-tuned Qwen3-32B in the full pipeline and applied RL on top of the SFT model in the simplified environment. The SFT model distilled from GPT-4.1 trajectories performs on par while being 56x smaller, and RL added 7-20% absolute gains under matched train-test conditions. "Thinking mode" was on par or worse in our experiments. Both SFT and RL models failed to generalize across environments, highlighting the importance of matching train-test environments for building reliable real-world code-fixing agents.
Similar Papers
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Artificial Intelligence
Robots learn to follow complex instructions better.
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Artificial Intelligence
Robots learn to follow complex instructions better.
Mitigating Forgetting Between Supervised and Reinforcement Learning Yields Stronger Reasoners
Computation and Language
Makes AI smarter by learning from mistakes.