A Hybrid Mixture of $t$-Factor Analyzers for Clustering High-dimensional Data
By: Kazeem Kareem, Fan Dai
Potential Business Impact:
Finds hidden groups in messy data faster.
This paper develops a novel hybrid approach for estimating the mixture model of $t$-factor analyzers (MtFA) that employs multivariate $t$-distribution and factor model to cluster and characterize grouped data. The traditional estimation method for MtFA faces computational challenges, particularly in high-dimensional settings, where the eigendecomposition of large covariance matrices and the iterative nature of Expectation-Maximization (EM) algorithms lead to scalability issues. We propose a computational scheme that integrates a profile likelihood method into the EM framework to efficiently obtain the model parameter estimates. The effectiveness of our approach is demonstrated through simulations showcasing its superior computational efficiency compared to the existing method, while preserving clustering accuracy and resilience against outliers. Our method is applied to cluster the Gamma-ray bursts, reinforcing several claims in the literature that Gamma-ray bursts have heterogeneous subpopulations and providing characterizations of the estimated groups.
Similar Papers
A Hybrid Mixture Approach for Clustering and Characterizing Cancer Data
Methodology
Finds disease types faster in big health data.
Factor Analysis with Correlated Topic Model for Multi-Modal Data
Machine Learning (CS)
Finds hidden patterns in complex data.
Bayesian Clustering Factor Models
Methodology
Finds hidden groups in data for better care.