Score: 2

Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders

Published: May 2, 2025 | arXiv ID: 2505.01134v1

By: Rogelio A Mancisidor , Robert Jenssen , Shujian Yu and more

Potential Business Impact:

Lets computers understand different kinds of information together.

Business Areas:
Collaborative Consumption Collaboration

Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.

Country of Origin
🇳🇴 Norway

Repos / Data Links

Page Count
26 pages

Category
Computer Science:
Machine Learning (CS)