Near-Optimality of Contrastive Divergence Algorithms
By: Pierre Glaser, Kevin Han Huang, Arthur Gretton
Potential Business Impact:
Makes computer learning faster and more accurate.
We perform a non-asymptotic analysis of the contrastive divergence (CD) algorithm, a training method for unnormalized models. While prior work has established that (for exponential family distributions) the CD iterates asymptotically converge at an $O(n^{-1 / 3})$ rate to the true parameter of the data distribution, we show, under some regularity assumptions, that CD can achieve the parametric rate $O(n^{-1 / 2})$. Our analysis provides results for various data batching schemes, including the fully online and minibatch ones. We additionally show that CD can be near-optimal, in the sense that its asymptotic variance is close to the Cram\'er-Rao lower bound.
Similar Papers
Divergence-Minimization for Latent-Structure Models: Monotone Operators, Contraction Guarantees, and Robust Inference
Statistics Theory
Makes computer models more accurate and reliable.
Stability-based Generalization Analysis of Randomized Coordinate Descent for Pairwise Learning
Machine Learning (CS)
Improves computer learning by finding better ways to compare things.
On Flow Matching KL Divergence
Machine Learning (CS)
Makes AI learn data more accurately and faster.