Score: 0

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Published: June 4, 2025 | arXiv ID: 2506.03517v1

By: Ziyi Wu , Anil Kag , Ivan Skorokhodov and more

Potential Business Impact:

Makes AI videos move better with less data.

Business Areas:

Motion Capture Media and Entertainment, Video

Direct Preference Optimization (DPO) has recently been applied as a post-training technique for text-to-video diffusion models. To obtain training data, annotators are asked to provide preferences between two videos generated from independent noise. However, this approach prohibits fine-grained comparisons, and we point out that it biases the annotators towards low-motion clips as they often contain fewer visual artifacts. In this work, we introduce DenseDPO, a method that addresses these shortcomings by making three contributions. First, we create each video pair for DPO by denoising corrupted copies of a ground truth video. This results in aligned pairs with similar motion structures while differing in local details, effectively neutralizing the motion bias. Second, we leverage the resulting temporal alignment to label preferences on short segments rather than entire clips, yielding a denser and more precise learning signal. With only one-third of the labeled data, DenseDPO greatly improves motion generation over vanilla DPO, while matching it in text alignment, visual quality, and temporal consistency. Finally, we show that DenseDPO unlocks automatic preference annotation using off-the-shelf Vision Language Models (VLMs): GPT accurately predicts segment-level preferences similar to task-specifically fine-tuned video reward models, and DenseDPO trained on these labels achieves performance close to using human labels.

Discriminator-Free Direct Preference Optimization for Video Diffusion

CV and Pattern Recognition

Makes videos look better by learning from mistakes.

11 Apr 2025 0

93%

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

CV and Pattern Recognition

Makes AI videos look more real and flow better.

7 Jan 2026 0

93%

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

CV and Pattern Recognition

Makes AI videos look more real and flow better.

7 Jan 2026 0

View PDF Login to Bookmark

Page Count

29 pages

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Makes AI videos move better with less data.

Technical Abstract

Discriminator-Free Direct Preference Optimization for Video Diffusion

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models