Score: 2

Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation

Published: May 31, 2025 | arXiv ID: 2506.00329v2

By: Muhammad Adnan , Nithesh Kurella , Akhil Arunkumar and more

Potential Business Impact:

Makes AI videos faster without losing quality.

Business Areas:

Image Recognition Data and Analytics, Software

Diffusion Transformers (DiTs) achieve state-of-the-art results in text-to-image, text-to-video generation, and editing. However, their large model size and the quadratic cost of spatial-temporal attention over multiple denoising steps make video generation computationally expensive. Static caching mitigates this by reusing features across fixed steps but fails to adapt to generation dynamics, leading to suboptimal trade-offs between speed and quality. We propose Foresight, an adaptive layer-reuse technique that reduces computational redundancy across denoising steps while preserving baseline performance. Foresight dynamically identifies and reuses DiT block outputs for all layers across steps, adapting to generation parameters such as resolution and denoising schedules to optimize efficiency. Applied to OpenSora, Latte, and CogVideoX, Foresight achieves up to \latencyimprv end-to-end speedup, while maintaining video quality. The source code of Foresight is available at \href{https://github.com/STAR-Laboratory/foresight}{https://github.com/STAR-Laboratory/foresight}.

QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation

CV and Pattern Recognition

Makes video creation faster without losing quality.

9 Mar 2025 0

89%

Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

CV and Pattern Recognition

Makes AI create pictures and videos faster.

22 Aug 2025 0

89%

MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration

Graphics

Makes videos create faster without losing quality.

18 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Repos / Data Links

github.com

Page Count

25 pages

Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation

Makes AI videos faster without losing quality.

Technical Abstract

QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation

Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers

MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration