Identifiable factor analysis for mixed continuous and binary variables based on the Gaussian-Grassmann distribution
By: Takashi Arai
Potential Business Impact:
Finds hidden patterns in mixed data.
We develop a factor analysis for mixed continuous and binary observed variables. To this end, we utilized a recently developed multivariate probability distribution for mixed-type random variables, the Gaussian-Grassmann distribution. In the proposed factor analysis, marginalization over latent variables can be performed analytically, yielding an analytical expression for the distribution of the observed variables. This analytical tractability allows model parameters to be estimated using standard gradient-based optimization techniques. We also address improper solutions associated with maximum likelihood factor analysis. We propose a prescription to avoid improper solutions by imposing a constraint that row vectors of the factor loading matrix have the same norm for all features. Then, we prove that the proposed factor analysis is identifiable under the norm constraint. We demonstrate the validity of this norm constraint prescription and numerically verified the model's identifiability using both real and synthetic datasets. We also compare the proposed model with quantification method and found that the proposed model achieves better reproducibility of correlations than the quantification method.
Similar Papers
Identifiability and Inference for Generalized Latent Factor Models
Methodology
Finds hidden patterns in data for better understanding.
Bayesian analysis of nonlinear structured latent factor models using a Gaussian Process Prior
Methodology
Find hidden patterns in complex data.
Nonparametric Factor Analysis and Beyond
Machine Learning (CS)
Finds hidden causes even with messy data.