Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling
By: Lipeng Xie , Sen Huang , Zhuo Zhang and more
Potential Business Impact:
Teaches AI to follow rules using fewer examples.
Reward models are essential for aligning Large Language Models (LLMs) with human values, yet their development is hampered by costly preference datasets and poor interpretability. While recent rubric-based approaches offer transparency, they often lack systematic quality control and optimization, creating a trade-off between scalability and reliability. We address these limitations with a novel, training-free framework built on a key assumption: \textit{evaluation rubrics underlying human preferences exhibit significant generalization ability across diverse queries}, a property that enables remarkable data efficiency. Our two-stage approach first infers high-quality, query-specific rubrics using a validation-guided \textbf{Propose-Evaluate-Revise} pipeline. Second, it generalizes these granular rubrics into a compact, non-redundant core set by maximizing an \textbf{information-theoretic coding rate}. The final output is an interpretable, hierarchical "Theme-Tips" rubric set. Extensive experiments demonstrate the framework's exceptional data efficiency and performance. Critically, using just 70 preference pairs (1.5\% of the source data), our method also empowers smaller models like Qwen3-8B to outperform specialized, fully-trained counterparts. This work pioneers a scalable, interpretable, and data-efficient path for reward modeling.
Similar Papers
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
CV and Pattern Recognition
Makes AI art follow your exact instructions better.
Online Rubrics Elicitation from Pairwise Comparisons
Computation and Language
Teaches computers to write better answers by changing rules.
Reinforcement Learning with Rubric Anchors
Artificial Intelligence
Teaches AI to write better, more human-like stories.