A Connection Between Score Matching and Local Intrinsic Dimension
By: Eric Yeats , Aaron Jacobson , Darryl Hannan and more
Potential Business Impact:
Measures data complexity faster and with less memory.
The local intrinsic dimension (LID) of data is a fundamental quantity in signal processing and learning theory, but quantifying the LID of high-dimensional, complex data has been a historically challenging task. Recent works have discovered that diffusion models capture the LID of data through the spectra of their score estimates and through the rate of change of their density estimates under various noise perturbations. While these methods can accurately quantify LID, they require either many forward passes of the diffusion model or use of gradient computation, limiting their applicability in compute- and memory-constrained scenarios. We show that the LID is a lower bound on the denoising score matching loss, motivating use of the denoising score matching loss as a LID estimator. Moreover, we show that the equivalent implicit score matching loss also approximates LID via the normal dimension and is closely related to a recent LID estimator, FLIPD. Our experiments on a manifold benchmark and with Stable Diffusion 3.5 indicate that the denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory footprint under increasing problem size and quantization level.
Similar Papers
Local Intrinsic Dimensionality of Ground Motion Data for Early Detection of Complex Catastrophic Slope Failure
Machine Learning (CS)
Spots landslides early by watching ground movement.
Implicit score matching meets denoising score matching: improved rates of convergence and log-density Hessian estimation
Statistics Theory
Helps computers learn to create realistic images.
Model-free filtering in high dimensions via projection and score-based diffusions
Statistics Theory
Cleans up messy data to find hidden patterns.