Score: 0

Identifiable factor analysis for mixed continuous and binary variables based on the Gaussian-Grassmann distribution

Published: December 11, 2025 | arXiv ID: 2512.10804v1

By: Takashi Arai

Potential Business Impact:

Finds hidden patterns in mixed data.

Business Areas:
A/B Testing Data and Analytics

We develop a factor analysis for mixed continuous and binary observed variables. To this end, we utilized a recently developed multivariate probability distribution for mixed-type random variables, the Gaussian-Grassmann distribution. In the proposed factor analysis, marginalization over latent variables can be performed analytically, yielding an analytical expression for the distribution of the observed variables. This analytical tractability allows model parameters to be estimated using standard gradient-based optimization techniques. We also address improper solutions associated with maximum likelihood factor analysis. We propose a prescription to avoid improper solutions by imposing a constraint that row vectors of the factor loading matrix have the same norm for all features. Then, we prove that the proposed factor analysis is identifiable under the norm constraint. We demonstrate the validity of this norm constraint prescription and numerically verified the model's identifiability using both real and synthetic datasets. We also compare the proposed model with quantification method and found that the proposed model achieves better reproducibility of correlations than the quantification method.

Country of Origin
🇯🇵 Japan

Page Count
25 pages

Category
Statistics:
Methodology