Hellinger Multimodal Variational Autoencoders
By: Huyen Khanh Vo, Isabel Valera
Potential Business Impact:
Helps computers learn better from different kinds of information.
Multimodal variational autoencoders (VAEs) are widely used for weakly supervised generative learning with multiple modalities. Predominant methods aggregate unimodal inference distributions using either a product of experts (PoE), a mixture of experts (MoE), or their combinations to approximate the joint posterior. In this work, we revisit multimodal inference through the lens of probabilistic opinion pooling, an optimization-based approach. We start from Hölder pooling with $α=0.5$, which corresponds to the unique symmetric member of the $α\text{-divergence}$ family, and derive a moment-matching approximation, termed Hellinger. We then leverage such an approximation to propose HELVAE, a multimodal VAE that avoids sub-sampling, yielding an efficient yet effective model that: (i) learns more expressive latent representations as additional modalities are observed; and (ii) empirically achieves better trade-offs between generative coherence and quality, outperforming state-of-the-art multimodal VAE models.
Similar Papers
Bridging the inference gap in Mutimodal Variational Autoencoders
Machine Learning (CS)
Creates better AI that understands different kinds of information.
Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders
Machine Learning (CS)
Lets computers understand different kinds of information together.
M^2VAE: Multi-Modal Multi-View Variational Autoencoder for Cold-start Item Recommendation
Information Retrieval
Helps computers suggest new things you'll like.