Score: 0

VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences

Published: March 18, 2025 | arXiv ID: 2503.13817v1

By: Anukriti Singh , Amisha Bhaskar , Peihong Yu and more

Potential Business Impact:

Teaches robots to learn better by showing their path.

Business Areas:

Image Recognition Data and Analytics, Software

Designing reward functions for continuous-control robotics often leads to subtle misalignments or reward hacking, especially in complex tasks. Preference-based RL mitigates some of these pitfalls by learning rewards from comparative feedback rather than hand-crafted signals, yet scaling human annotations remains challenging. Recent work uses Vision-Language Models (VLMs) to automate preference labeling, but a single final-state image generally fails to capture the agent's full motion. In this paper, we present a two-part solution that both improves feedback accuracy and better aligns reward learning with the agent's policy. First, we overlay trajectory sketches on final observations to reveal the path taken, allowing VLMs to provide more reliable preferences-improving preference accuracy by approximately 15-20% in metaworld tasks. Second, we regularize reward learning by incorporating the agent's performance, ensuring that the reward model is optimized based on data generated by the current policy; this addition boosts episode returns by 20-30% in locomotion tasks. Empirical studies on metaworld demonstrate that our method achieves, for instance, around 70-80% success rate in all tasks, compared to below 50% for standard approaches. These results underscore the efficacy of combining richer visual representations with agent-aware reward regularization.

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

Machine Learning (CS)

AI learns to guide robots better with AI feedback.

15 Jun 2025 1

91%

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

CV and Pattern Recognition

Teaches AI to understand pictures better, faster.

23 Mar 2025 2

91%

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Robotics

Helps robots learn tasks faster and better.

19 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

8 pages

VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences

Teaches robots to learn better by showing their path.

Technical Abstract

Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning