InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis
By: Shiqin Tang, Shujian Yu
Potential Business Impact:
Finds hidden patterns in linked data.
Extracting meaningful latent representations from high-dimensional sequential data is a crucial challenge in machine learning, with applications spanning natural science and engineering. We introduce InfoDPCCA, a dynamic probabilistic Canonical Correlation Analysis (CCA) framework designed to model two interdependent sequences of observations. InfoDPCCA leverages a novel information-theoretic objective to extract a shared latent representation that captures the mutual structure between the data streams and balances representation compression and predictive sufficiency while also learning separate latent components that encode information specific to each sequence. Unlike prior dynamic CCA models, such as DPCCA, our approach explicitly enforces the shared latent space to encode only the mutual information between the sequences, improving interpretability and robustness. We further introduce a two-step training scheme to bridge the gap between information-theoretic representation learning and generative modeling, along with a residual connection mechanism to enhance training stability. Through experiments on synthetic and medical fMRI data, we demonstrate that InfoDPCCA excels as a tool for representation learning. Code of InfoDPCCA is available at https://github.com/marcusstang/InfoDPCCA.
Similar Papers
Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations
Machine Learning (Stat)
Combines different data types, even with missing parts.
Sparse canonical correlation analysis for multiple measurements with latent trajectories
Methodology
Finds hidden patterns in changing health data.
Two new approaches to multiple canonical correlation analysis for repeated measures data
Methodology
Finds hidden connections in complex, changing data.