Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization
By: Yu Hou , Hua Li , Ha Young Kim and more
Potential Business Impact:
Makes movie suggestions better and faster.
Diffusion models recently emerged as a powerful paradigm for recommender systems, offering state-of-the-art performance by modeling the generative process of user-item interactions. However, training such models from scratch is both computationally expensive and yields diminishing returns once convergence is reached. To remedy these challenges, we propose ReFiT, a new framework that integrates Reinforcement learning (RL)-based Fine-Tuning into diffusion-based recommender systems. In contrast to prior RL approaches for diffusion models depending on external reward models, ReFiT adopts a task-aligned design: it formulates the denoising trajectory as a Markov decision process (MDP) and incorporates a collaborative signal-aware reward function that directly reflects recommendation quality. By tightly coupling the MDP structure with this reward signal, ReFiT empowers the RL agent to exploit high-order connectivity for fine-grained optimization, while avoiding the noisy or uninformative feedback common in naive reward designs. Leveraging policy gradient optimization, ReFiT maximizes exact log-likelihood of observed interactions, thereby enabling effective post hoc fine-tuning of diffusion recommenders. Comprehensive experiments on wide-ranging real-world datasets demonstrate that the proposed ReFiT framework (a) exhibits substantial performance gains over strong competitors (up to 36.3% on sequential recommendation), (b) demonstrates strong efficiency with linear complexity in the number of users or items, and (c) generalizes well across multiple diffusion-based recommendation scenarios. The source code and datasets are publicly available at https://anonymous.4open.science/r/ReFiT-4C60.
Similar Papers
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
Machine Learning (CS)
Makes AI art better match what you want.
Reinforced Preference Optimization for Recommendation
Information Retrieval
Makes movie suggestions better by learning from mistakes.
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Machine Learning (CS)
Makes AI art look better and more natural.