Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning
By: Younghwan Lee , Tung M. Luu , Donghoon Lee and more
Potential Business Impact:
Teaches computers to learn from old data alone.
In offline reinforcement learning (RL), learning from fixed datasets presents a promising solution for domains where real-time interaction with the environment is expensive or risky. However, designing dense reward signals for offline dataset requires significant human effort and domain expertise. Reinforcement learning with human feedback (RLHF) has emerged as an alternative, but it remains costly due to the human-in-the-loop process, prompting interest in automated reward generation models. To address this, we propose Reward Generation via Large Vision-Language Models (RG-VLM), which leverages the reasoning capabilities of LVLMs to generate rewards from offline data without human involvement. RG-VLM improves generalization in long-horizon tasks and can be seamlessly integrated with the sparse reward signals to enhance task performance, demonstrating its potential as an auxiliary reward signal.
Similar Papers
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
CV and Pattern Recognition
Teaches AI to understand pictures better, faster.
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
Machine Learning (CS)
AI learns to guide robots better with AI feedback.
VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training
Computation and Language
Teaches AI to see and talk better.