Score: 0

FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

Published: December 3, 2025 | arXiv ID: 2512.03520v1

By: Yiyi Cai , Yuhan Wu , Kunhang Li and more

Potential Business Impact:

Creates realistic human movements from text.

Business Areas:

Motion Capture Media and Entertainment, Video

We present FloodDiffusion, a new framework for text-driven, streaming human motion generation. Given time-varying text prompts, FloodDiffusion generates text-aligned, seamless motion sequences with real-time latency. Unlike existing methods that rely on chunk-by-chunk or auto-regressive model with diffusion head, we adopt a diffusion forcing framework to model this time-series generation task under time-varying control events. We find that a straightforward implementation of vanilla diffusion forcing (as proposed for video models) fails to model real motion distributions. We demonstrate that to guarantee modeling the output distribution, the vanilla diffusion forcing must be tailored to: (i) train with a bi-directional attention instead of casual attention; (ii) implement a lower triangular time scheduler instead of a random one; (iii) utilize a continues time-varying way to introduce text conditioning. With these improvements, we demonstrate in the first time that the diffusion forcing-based framework achieves state-of-the-art performance on the streaming motion generation task, reaching an FID of 0.057 on the HumanML3D benchmark. Models, code, and weights are available. https://shandaai.github.io/FloodDiffusion/

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

CV and Pattern Recognition

Makes live videos change instantly as you create them.

10 Nov 2025 1

88%

Flood-LDM: Generalizable Latent Diffusion Models for rapid and accurate zero-shot High-Resolution Flood Mapping

CV and Pattern Recognition

Predicts floods faster and more accurately.

18 Nov 2025 2

88%

StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model

CV and Pattern Recognition

Makes computer faces talk in real-time.

18 Nov 2025 0

View PDF Login to Bookmark

Page Count

15 pages

FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

Creates realistic human movements from text.

Technical Abstract

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Flood-LDM: Generalizable Latent Diffusion Models for rapid and accurate zero-shot High-Resolution Flood Mapping

StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model