Score: 0

Inference-time Physics Alignment of Video Generative Models with Latent World Models

Published: January 15, 2026 | arXiv ID: 2601.10553v1

By: Jianhao Yuan , Xiaofeng Zhang , Felix Friedrich and more

State-of-the-art video generative models produce promising visual content yet often violate basic physics principles, limiting their utility. While some attribute this deficiency to insufficient physics understanding from pre-training, we find that the shortfall in physics plausibility also stems from suboptimal inference strategies. We therefore introduce WMReward and treat improving physics plausibility of video generation as an inference-time alignment problem. In particular, we leverage the strong physics prior of a latent world model (here, VJEPA-2) as a reward to search and steer multiple candidate denoising trajectories, enabling scaling test-time compute for better generation performance. Empirically, our approach substantially improves physics plausibility across image-conditioned, multiframe-conditioned, and text-conditioned generation settings, with validation from human preference study. Notably, in the ICCV 2025 Perception Test PhysicsIQ Challenge, we achieve a final score of 62.64%, winning first place and outperforming the previous state of the art by 7.42%. Our work demonstrates the viability of using latent world models to improve physics plausibility of video generation, beyond this specific instantiation or parameterization.

PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance

CV and Pattern Recognition

Makes videos look real, with correct physics.

7 Jan 2026 1

91%

Improving the Physics of Video Generation with VJEPA-2 Reward Signal

CV and Pattern Recognition

Makes computer videos follow real-world physics rules.

22 Oct 2025 3

90%

Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement

CV and Pattern Recognition

Makes videos follow real-world physics rules.

25 Nov 2025 0

View PDF Login to Bookmark

Inference-time Physics Alignment of Video Generative Models with Latent World Models

Technical Abstract

PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance

Improving the Physics of Video Generation with VJEPA-2 Reward Signal

Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement