Score: 1

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

Published: December 4, 2025 | arXiv ID: 2512.04559v1

By: Hyeongyu Kang , Jaewoo Lee , Woocheol Shin and more

Potential Business Impact:

Makes AI art look better and more natural.

Business Areas:

Quantum Computing Science and Engineering

Diffusion models excel at generating high-likelihood samples but often require alignment with downstream objectives. Existing fine-tuning methods for diffusion models significantly suffer from reward over-optimization, resulting in high-reward but unnatural samples and degraded diversity. To mitigate over-optimization, we propose \textbf{Soft Q-based Diffusion Finetuning (SQDF)}, a novel KL-regularized RL method for diffusion alignment that applies a reparameterized policy gradient of a training-free, differentiable estimation of the soft Q-function. SQDF is further enhanced with three innovations: a discount factor for proper credit assignment in the denoising process, the integration of consistency models to refine Q-function estimates, and the use of an off-policy replay buffer to improve mode coverage and manage the reward-diversity trade-off. Our experiments demonstrate that SQDF achieves superior target rewards while preserving diversity in text-to-image alignment. Furthermore, in online black-box optimization, SQDF attains high sample efficiency while maintaining naturalness and diversity.

Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures

Quantum Physics

Makes AI art look better by adjusting its settings.

17 Sep 2025 0

88%

Data-regularized Reinforcement Learning for Diffusion Models at Scale

Machine Learning (CS)

Makes AI create better videos that people like.

3 Dec 2025 1

88%

Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

Information Retrieval

Makes movie suggestions better and faster.

10 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Repos / Data Links

github.com

Page Count

36 pages

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

Makes AI art look better and more natural.

Technical Abstract

Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures

Data-regularized Reinforcement Learning for Diffusion Models at Scale

Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization