RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
By: Jinrui Liu , Bingyan Nie , Boyu Li and more
Potential Business Impact:
Robots learn to follow complex instructions better.
Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visual-spatial understanding and reasoning. To achieve physical understanding and action sequence consistency in multi-step reasoning tasks, we design a rule-based reward function that simultaneously considers long-horizon performance and action constraint in the environment. The reasoning model, trained on Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini, by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the EmbodiedBench benchmark.
Similar Papers
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Artificial Intelligence
Robots learn to follow complex instructions better.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Robotics
Teaches robots to do tasks better.
Reinforced Reasoning for Embodied Planning
Artificial Intelligence
Teaches robots to plan and act in new places.