Score: 2

MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation

Published: July 6, 2025 | arXiv ID: 2507.04290v1

By: Weilun Feng , Chuanguang Yang , Haotong Qin and more

Potential Business Impact:

Makes AI image makers work on small devices.

Business Areas:

DSP Hardware

Diffusion models have demonstrated remarkable performance on vision generation tasks. However, the high computational complexity hinders its wide application on edge devices. Quantization has emerged as a promising technique for inference acceleration and memory reduction. However, existing quantization methods do not generalize well under extremely low-bit (2-4 bit) quantization. Directly applying these methods will cause severe performance degradation. We identify that the existing quantization framework suffers from the outlier-unfriendly quantizer design, suboptimal initialization, and optimization strategy. We present MPQ-DMv2, an improved \textbf{M}ixed \textbf{P}recision \textbf{Q}uantization framework for extremely low-bit \textbf{D}iffusion \textbf{M}odels. For the quantization perspective, the imbalanced distribution caused by salient outliers is quantization-unfriendly for uniform quantizer. We propose \textit{Flexible Z-Order Residual Mixed Quantization} that utilizes an efficient binary residual branch for flexible quant steps to handle salient error. For the optimization framework, we theoretically analyzed the convergence and optimality of the LoRA module and propose \textit{Object-Oriented Low-Rank Initialization} to use prior quantization error for informative initialization. We then propose \textit{Memory-based Temporal Relation Distillation} to construct an online time-aware pixel queue for long-term denoising temporal information distillation, which ensures the overall temporal consistency between quantized and full-precision model. Comprehensive experiments on various generation tasks show that our MPQ-DMv2 surpasses current SOTA methods by a great margin on different architectures, especially under extremely low-bit widths.

SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity

CV and Pattern Recognition

Makes AI art creation much faster and cheaper.

26 Jan 2025 1

89%

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Machine Learning (CS)

Makes AI image generators faster and smaller.

27 May 2025 0

89%

D$^2$-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models

CV and Pattern Recognition

Makes AI art generators faster and smaller.

14 Jan 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇭 🇨🇳 🇺🇸 United States, China, Switzerland

Page Count

15 pages

MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation

Makes AI image makers work on small devices.

Technical Abstract

SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

D$^2$-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models