Improving the Physics of Video Generation with VJEPA-2 Reward Signal
By: Jianhao Yuan , Xiaofeng Zhang , Felix Friedrich and more
Potential Business Impact:
Makes computer videos follow real-world physics rules.
This is a short technical report describing the winning entry of the PhysicsIQ Challenge, presented at the Perception Test Workshop at ICCV 2025. State-of-the-art video generative models exhibit severely limited physical understanding, and often produce implausible videos. The Physics IQ benchmark has shown that visual realism does not imply physics understanding. Yet, intuitive physics understanding has shown to emerge from SSL pretraining on natural videos. In this report, we investigate whether we can leverage SSL-based video world models to improve the physics plausibility of video generative models. In particular, we build ontop of the state-of-the-art video generative model MAGI-1 and couple it with the recently introduced Video Joint Embedding Predictive Architecture 2 (VJEPA-2) to guide the generation process. We show that by leveraging VJEPA-2 as reward signal, we can improve the physics plausibility of state-of-the-art video generative models by ~6%.
Similar Papers
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
CV and Pattern Recognition
Makes videos look real, with correct physics.
From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis
CV and Pattern Recognition
Helps doctors understand brain signals better.
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
CV and Pattern Recognition
Makes videos follow real-world physics rules.