Why Heuristic Weighting Works: A Theoretical Analysis of Denoising Score Matching
By: Juyan Zhang , Rhys Newbury , Xinyang Zhang and more
Potential Business Impact:
Makes AI better at cleaning up messy pictures.
Score matching enables the estimation of the gradient of a data distribution, a key component in denoising diffusion models used to recover clean data from corrupted inputs. In prior work, a heuristic weighting function has been used for the denoising score matching loss without formal justification. In this work, we demonstrate that heteroskedasticity is an inherent property of the denoising score matching objective. This insight leads to a principled derivation of optimal weighting functions for generalized, arbitrary-order denoising score matching losses, without requiring assumptions about the noise distribution. Among these, the first-order formulation is especially relevant to diffusion models. We show that the widely used heuristical weighting function arises as a first-order Taylor approximation to the trace of the expected optimal weighting. We further provide theoretical and empirical comparisons, revealing that the heuristical weighting, despite its simplicity, can achieve lower variance than the optimal weighting with respect to parameter gradients, which can facilitate more stable and efficient training.
Similar Papers
Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching
Machine Learning (CS)
Makes AI create realistic pictures by learning data movement.
Score-Based Density Estimation from Pairwise Comparisons
Machine Learning (CS)
Teaches computers to guess what people prefer.
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
Machine Learning (CS)
Teaches computers to learn patterns from data.