From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference
By: Sean Plummer
Potential Business Impact:
Helps computers learn better from less data.
Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q(θ) = \int k(θ| z) r(dz)$, where the conditional kernel is parameterized and the mixing base is fixed and tractable. This paper develops a unified "approximation-optimization-statistics'' theory for such families. On the approximation side, we show that under compact L1-universality and a mild tail-dominance condition, semi-implicit families are dense in L1 and can achieve arbitrarily small forward Kullback-Leibler (KL) error. We also identify two sharp obstructions to global approximation: (i) an Orlicz tail-mismatch condition that induces a strictly positive forward-KL gap, and (ii) structural restrictions, such as non-autoregressive Gaussian kernels, that force "branch collapse'' in conditional distributions. For each obstruction we give a minimal structural modification that restores approximability. On the optimization side, we establish finite-sample oracle inequalities and prove that the empirical SIVI objectives L(K,n) $Γ$-converge to their population limit as n and K tend to infinity. These results give consistency of empirical maximizers, quantitative control of finite-K surrogate bias, and stability of the resulting variational posteriors. Combining the approximation and optimization analyses yields the first general end-to-end statistical theory for SIVI: we characterize precisely when SIVI can recover the target distribution, when it cannot, and how architectural and algorithmic choices govern the attainable asymptotic behavior.
Similar Papers
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent
Machine Learning (CS)
Makes computer learning faster and more accurate.
Revisiting Unbiased Implicit Variational Inference
Machine Learning (CS)
Makes computer learning faster and more accurate.
Semi-Implicit Approaches for Large-Scale Bayesian Spatial Interpolation
Computation
Makes mapping faster and more accurate.