AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
By: Mengzhao Jia , Zhihan Zhang , Ignacio Cases and more
Potential Business Impact:
Teaches AI to think step-by-step, not just guess.
Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose AutoRubric-R1V, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric-R1V achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.
Similar Papers
Reinforcement Learning with Rubric Anchors
Artificial Intelligence
Teaches AI to write better, more human-like stories.
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
CV and Pattern Recognition
Makes AI art follow your exact instructions better.
An Efficient Rubric-based Generative Verifier for Search-Augmented LLMs
Computation and Language
Makes AI better at finding answers online.