Score: 0

Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning

Published: April 3, 2025 | arXiv ID: 2504.08772v1

By: Younghwan Lee , Tung M. Luu , Donghoon Lee and more

Potential Business Impact:

Teaches computers to learn from old data alone.

Business Areas:

Image Recognition Data and Analytics, Software

In offline reinforcement learning (RL), learning from fixed datasets presents a promising solution for domains where real-time interaction with the environment is expensive or risky. However, designing dense reward signals for offline dataset requires significant human effort and domain expertise. Reinforcement learning with human feedback (RLHF) has emerged as an alternative, but it remains costly due to the human-in-the-loop process, prompting interest in automated reward generation models. To address this, we propose Reward Generation via Large Vision-Language Models (RG-VLM), which leverages the reasoning capabilities of LVLMs to generate rewards from offline data without human involvement. RG-VLM improves generalization in long-horizon tasks and can be seamlessly integrated with the sparse reward signals to enhance task performance, demonstrating its potential as an auxiliary reward signal.

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

CV and Pattern Recognition

Teaches AI to understand pictures better, faster.

23 Mar 2025 2

91%

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

Machine Learning (CS)

AI learns to guide robots better with AI feedback.

15 Jun 2025 1

90%

VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training

Computation and Language

Teaches AI to see and talk better.

16 Jun 2025 2

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Page Count

5 pages

Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning

Teaches computers to learn from old data alone.

Technical Abstract

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training