A Survey on Progress in LLM Alignment from the Perspective of Reward Design
By: Miaomiao Ji , Yanqiu Wu , Zhibin Wu and more
Potential Business Impact:
Teaches AI to act like good people.
Reward design plays a pivotal role in aligning large language models (LLMs) with human values, serving as the bridge between feedback signals and model optimization. This survey provides a structured organization of reward modeling and addresses three key aspects: mathematical formulation, construction practices, and interaction with optimization paradigms. Building on this, it develops a macro-level taxonomy that characterizes reward mechanisms along complementary dimensions, thereby offering both conceptual clarity and practical guidance for alignment research. The progression of LLM alignment can be understood as a continuous refinement of reward design strategies, with recent developments highlighting paradigm shifts from reinforcement learning (RL)-based to RL-free optimization and from single-task to multi-objective and complex settings.
Similar Papers
Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle
Computation and Language
Teaches computers to think and follow instructions better.
Reward Model Perspectives: Whose Opinions Do Reward Models Reward?
Computation and Language
Makes AI less biased and fairer to everyone.
Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment
Artificial Intelligence
Teaches AI to follow instructions better.