Score: 1

Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning

Published: October 21, 2025 | arXiv ID: 2510.18849v1

By: Chenghao Zhu , Meiling Tao , Tiannan Wang and more

Potential Business Impact:

Teaches AI to write exactly how you like it.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Faithfully personalizing large language models (LLMs) to align with individual user preferences is a critical but challenging task. While supervised fine-tuning (SFT) quickly reaches a performance plateau, standard reinforcement learning from human feedback (RLHF) also struggles with the nuances of personalization. Scalar-based reward models are prone to reward hacking which leads to verbose and superficially personalized responses. To address these limitations, we propose Critique-Post-Edit, a robust reinforcement learning framework that enables more faithful and controllable personalization. Our framework integrates two key components: (1) a Personalized Generative Reward Model (GRM) that provides multi-dimensional scores and textual critiques to resist reward hacking, and (2) a Critique-Post-Edit mechanism where the policy model revises its own outputs based on these critiques for more targeted and efficient learning. Under a rigorous length-controlled evaluation, our method substantially outperforms standard PPO on personalization benchmarks. Personalized Qwen2.5-7B achieves an average 11\% win-rate improvement, and personalized Qwen2.5-14B model surpasses the performance of GPT-4.1. These results demonstrate a practical path to faithful, efficient, and controllable personalization.

Iterative Critique-Refine Framework for Enhancing LLM Personalization

Computation and Language

Makes AI write like your favorite author.

28 Oct 2025 0

90%

A Mathematical Framework for Custom Reward Functions in Job Application Evaluation using Reinforcement Learning

Machine Learning (CS)

Helps hiring software find better job candidates.

20 Nov 2025 0

89%

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

Machine Learning (Stat)

Makes AI understand what people want better.

3 Apr 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com

Page Count

20 pages

Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning

Teaches AI to write exactly how you like it.

Technical Abstract

Iterative Critique-Refine Framework for Enhancing LLM Personalization

A Mathematical Framework for Custom Reward Functions in Job Application Evaluation using Reinforcement Learning

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning