Score: 1

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

Published: December 31, 2025 | arXiv ID: 2512.24551v1

By: Yuanhao Cai , Kunpeng Li , Menglin Jia and more

Potential Business Impact:

Makes videos follow real-world physics rules.

Business Areas:

Image Recognition Data and Analytics, Software

Recent advances in text-to-video (T2V) generation have achieved good visual quality, yet synthesizing videos that faithfully follow physical laws remains an open challenge. Existing methods mainly based on graphics or prompt extension struggle to generalize beyond simple simulated environments or learn implicit physical reasoning. The scarcity of training data with rich physics interactions and phenomena is also a problem. In this paper, we first introduce a Physics-Augmented video data construction Pipeline, PhyAugPipe, that leverages a vision-language model (VLM) with chain-of-thought reasoning to collect a large-scale training dataset, PhyVidGen-135K. Then we formulate a principled Physics-aware Groupwise Direct Preference Optimization, PhyGDPO, framework that builds upon the groupwise Plackett-Luce probabilistic model to capture holistic preferences beyond pairwise comparisons. In PhyGDPO, we design a Physics-Guided Rewarding (PGR) scheme that embeds VLM-based physics rewards to steer optimization toward physical consistency. We also propose a LoRA-Switch Reference (LoRA-SR) scheme that eliminates memory-heavy reference duplication for efficient training. Experiments show that our method significantly outperforms state-of-the-art open-source methods on PhyGenBench and VideoPhy2. Please check our project page at https://caiyuanhao1998.github.io/project/PhyGDPO for more video results. Our code, models, and data will be released at https://github.com/caiyuanhao1998/Open-PhyGDPO

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation

CV and Pattern Recognition

Makes AI videos follow real-world physics rules.

14 Aug 2025 0

91%

PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

CV and Pattern Recognition

Makes computer videos move like real life.

6 Nov 2025 0

90%

Diverse Video Generation with Determinantal Point Process-Guided Policy Optimization

CV and Pattern Recognition

Makes AI create many different videos from one idea.

25 Nov 2025 1

View PDF Login to Bookmark

Page Count

16 pages

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

Makes videos follow real-world physics rules.

Technical Abstract

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation

PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

Diverse Video Generation with Determinantal Point Process-Guided Policy Optimization