Score: 1

BaseReward: A Strong Baseline for Multimodal Reward Model

Published: September 19, 2025 | arXiv ID: 2509.16127v1

By: Yi-Fan Zhang , Haihua Yang , Huanyu Zhang and more

Potential Business Impact:

Teaches AI to understand and judge images and text.

Business Areas:

Multi-level Marketing Sales and Marketing

The rapid advancement of Multimodal Large Language Models (MLLMs) has made aligning them with human preferences a critical challenge. Reward Models (RMs) are a core technology for achieving this goal, but a systematic guide for building state-of-the-art Multimodal Reward Models (MRMs) is currently lacking in both academia and industry. Through exhaustive experimental analysis, this paper aims to provide a clear ``recipe'' for constructing high-performance MRMs. We systematically investigate every crucial component in the MRM development pipeline, including \textit{reward modeling paradigms} (e.g., Naive-RM, Critic-based RM, and Generative RM), \textit{reward head architecture}, \textit{training strategies}, \textit{data curation} (covering over ten multimodal and text-only preference datasets), \textit{backbone model} and \textit{model scale}, and \textit{ensemble methods}. Based on these experimental insights, we introduce \textbf{BaseReward}, a powerful and efficient baseline for multimodal reward modeling. BaseReward adopts a simple yet effective architecture, built upon a {Qwen2.5-VL} backbone, featuring an optimized two-layer reward head, and is trained on a carefully curated mixture of high-quality multimodal and text-only preference data. Our results show that BaseReward establishes a new SOTA on major benchmarks such as MM-RLHF-Reward Bench, VL-Reward Bench, and Multimodal Reward Bench, outperforming previous models. Furthermore, to validate its practical utility beyond static benchmarks, we integrate BaseReward into a real-world reinforcement learning pipeline, successfully enhancing an MLLM's performance across various perception, reasoning, and conversational tasks. This work not only delivers a top-tier MRM but, more importantly, provides the community with a clear, empirically-backed guide for developing robust reward models for the next generation of MLLMs.

A Systematic Analysis of Base Model Choice for Reward Modeling

Computation and Language

Improves AI writing by picking the best starting AI.

16 May 2025 0

90%

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

CV and Pattern Recognition

Teaches AI to judge pictures and words better.

5 May 2025 2

89%

Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment

Artificial Intelligence

Teaches AI to follow instructions better.

6 Oct 2025 3

View PDF Login to Bookmark

Page Count

17 pages

BaseReward: A Strong Baseline for Multimodal Reward Model

Teaches AI to understand and judge images and text.

Technical Abstract

A Systematic Analysis of Base Model Choice for Reward Modeling

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment