Score: 3

Improving the Physics of Video Generation with VJEPA-2 Reward Signal

Published: October 22, 2025 | arXiv ID: 2510.21840v1

By: Jianhao Yuan , Xiaofeng Zhang , Felix Friedrich and more

BigTech Affiliations: Meta

Potential Business Impact:

Makes computer videos follow real-world physics rules.

Business Areas:
Motion Capture Media and Entertainment, Video

This is a short technical report describing the winning entry of the PhysicsIQ Challenge, presented at the Perception Test Workshop at ICCV 2025. State-of-the-art video generative models exhibit severely limited physical understanding, and often produce implausible videos. The Physics IQ benchmark has shown that visual realism does not imply physics understanding. Yet, intuitive physics understanding has shown to emerge from SSL pretraining on natural videos. In this report, we investigate whether we can leverage SSL-based video world models to improve the physics plausibility of video generative models. In particular, we build ontop of the state-of-the-art video generative model MAGI-1 and couple it with the recently introduced Video Joint Embedding Predictive Architecture 2 (VJEPA-2) to guide the generation process. We show that by leveraging VJEPA-2 as reward signal, we can improve the physics plausibility of state-of-the-art video generative models by ~6%.

Country of Origin
πŸ‡ΊπŸ‡Έ πŸ‡¬πŸ‡§ United States, United Kingdom

Page Count
2 pages

Category
Computer Science:
CV and Pattern Recognition