Score: 1

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Published: September 18, 2025 | arXiv ID: 2509.15130v2

By: Chenxi Song , Yanming Yang , Tong Zhao and more

Potential Business Impact:

Makes videos follow exact paths perfectly.

Business Areas:

Autonomous Vehicles Transportation

Recent video diffusion models show immense potential for spatial intelligence tasks due to their rich world priors, but this is undermined by limited controllability, poor spatial-temporal consistency, and entangled scene-camera dynamics. Existing solutions, such as model fine-tuning and warping-based repainting, struggle with scalability, generalization, and robustness against artifacts. To address this, we propose WorldForge, a training-free, inference-time framework composed of three tightly coupled modules. 1) Intra-Step Recursive Refinement injects fine-grained trajectory guidance at denoising steps through a recursive correction loop, ensuring motion remains aligned with the target path. 2) Flow-Gated Latent Fusion leverages optical flow similarity to decouple motion from appearance in the latent space and selectively inject trajectory guidance into motion-related channels. 3) Dual-Path Self-Corrective Guidance compares guided and unguided denoising paths to adaptively correct trajectory drift caused by noisy or misaligned structural signals. Together, these components inject fine-grained, trajectory-aligned guidance without training, achieving both accurate motion control and photorealistic content generation. Our framework is plug-and-play and model-agnostic, enabling broad applicability across various 3D/4D tasks. Extensive experiments demonstrate that our method achieves state-of-the-art performance in trajectory adherence, geometric consistency, and perceptual quality, outperforming both training-intensive and inference-only baselines.

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Graphics

Makes videos move exactly how you want.

18 Sep 2025 0

88%

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

CV and Pattern Recognition

Makes videos follow your drawings and ideas.

8 Jun 2025 2

88%

TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos

Robotics

Robots learn new jobs from watching humans.

26 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇸🇬 Singapore

Page Count

25 pages

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Makes videos follow exact paths perfectly.

Technical Abstract

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos