DVD-Quant: Data-free Video Diffusion Transformers Quantization
By: Zhiteng Li , Hanxuan Li , Junyi Wu and more
Potential Business Impact:
Makes video creation faster without losing quality.
Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on lengthy, computation-heavy calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Progressive Bounded Quantization (PBQ) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) $\delta$-Guided Bit Switching ($\delta$-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2$\times$ speedup over full-precision baselines on HunyuanVideo while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.
Similar Papers
Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers
CV and Pattern Recognition
Makes video creation AI run on small devices.
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation
CV and Pattern Recognition
Makes AI image and video tools smaller, faster.
Post-Training Quantization for Audio Diffusion Transformers
Audio and Speech Processing
Makes AI music creation faster and smaller.