Score: 2

AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Published: October 16, 2025 | arXiv ID: 2510.14738v1

By: Mengzhao Jia , Zhihan Zhang , Ignacio Cases and more

Potential Business Impact:

Teaches AI to think step-by-step, not just guess.

Business Areas:
Robotics Hardware, Science and Engineering, Software

Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose AutoRubric-R1V, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric-R1V achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.

Country of Origin
🇺🇸 United States


Page Count
21 pages

Category
Computer Science:
Computation and Language