Score: 2

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Published: October 21, 2025 | arXiv ID: 2510.18353v1

By: Yi-Lun Wu , Bo-Kai Ruan , Chiang Tseng and more

Potential Business Impact:

Makes AI art better match what you want.

Business Areas:

Image Recognition Data and Analytics, Software

Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE algorithm but still struggle with challenges such as accurately estimating image probabilities due to the non-linear nature of the sigmoid function and the limited diversity of offline datasets. In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem, thereby simplifying the training objective into a denoising formulation and overcoming the non-linear estimation issues found in prior methods. Moreover, Diffusion-DRO uniquely integrates offline expert demonstrations with online policy-generated negative samples, enabling it to effectively capture human preferences while addressing the limitations of offline data. Comprehensive experiments show that Diffusion-DRO delivers improved generation quality across a range of challenging and unseen prompts, outperforming state-of-the-art baselines in both both quantitative metrics and user studies. Our source code and pre-trained models are available at https://github.com/basiclab/DiffusionDRO.

Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision

CV and Pattern Recognition

Makes AI art better match your ideas.

29 Dec 2025 2

91%

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

CV and Pattern Recognition

Makes AI art follow your words better.

5 Nov 2025 2

91%

Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences

CV and Pattern Recognition

Makes AI art better by learning what people like.

3 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇹🇼 Taiwan, Province of China

Repos / Data Links

github.com

Page Count

28 pages

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Makes AI art better match what you want.

Technical Abstract

Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences