Variational decomposition autoencoding improves disentanglement of latent representations
By: Ioannis Ziogas , Aamna Al Shehhi , Ahsan H. Khandoker and more
Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedical signal processing, the ability to learn disentangled and interpretable representations is critical for uncovering latent generative mechanisms. Traditional approaches to unsupervised representation learning, including variational autoencoders (VAEs), often struggle to capture the temporal and spectral diversity inherent in such data. Here we introduce variational decomposition autoencoding (VDA), a framework that extends VAEs by incorporating a strong structural bias toward signal decomposition. VDA is instantiated through variational decomposition autoencoders (DecVAEs), i.e., encoder-only neural networks that combine a signal decomposition model, a contrastive self-supervised task, and variational prior approximation to learn multiple latent subspaces aligned with time-frequency characteristics. We demonstrate the effectiveness of DecVAEs on simulated data and three publicly available scientific datasets, spanning speech recognition, dysarthria severity evaluation, and emotional speech classification. Our results demonstrate that DecVAEs surpass state-of-the-art VAE-based methods in terms of disentanglement quality, generalization across tasks, and the interpretability of latent encodings. These findings suggest that decomposition-aware architectures can serve as robust tools for extracting structured representations from dynamic signals, with potential applications in clinical diagnostics, human-computer interaction, and adaptive neurotechnologies.
Similar Papers
DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation
CV and Pattern Recognition
Makes videos smaller by separating key parts.
An Introduction to Discrete Variational Autoencoders
Machine Learning (CS)
Teaches computers to understand words by grouping them.
Disentanglement of Sources in a Multi-Stream Variational Autoencoder
Machine Learning (Stat)
Separates sounds and writings into their own parts.