RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
By: Raghav Sharma, Manan Mehta, Sai Tiger Raina
Potential Business Impact:
Makes AI understand and act fairly.
Reinforcement Learning from Human Feedback (RLHF) is the standard for aligning Large Language Models (LLMs), yet recent progress has moved beyond canonical text-based methods. This survey synthesizes the new frontier of alignment research by addressing critical gaps in multi-modal alignment, cultural fairness, and low-latency optimization. To systematically explore these domains, we first review foundational algo- rithms, including PPO, DPO, and GRPO, before presenting a detailed analysis of the latest innovations. By providing a comparative synthesis of these techniques and outlining open challenges, this work serves as an essential roadmap for researchers building more robust, efficient, and equitable AI systems.
Similar Papers
Aligning to What? Limits to RLHF Based Alignment
Computation and Language
Fixes AI bias, but not perfectly yet.
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Machine Learning (Stat)
Makes AI understand what people want better.
RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders
Machine Learning (CS)
Makes online suggestions better by learning from your actions.