Score: 1

ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data

Published: May 12, 2025 | arXiv ID: 2505.07272v1

By: Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano

Potential Business Impact:

Improves data analysis with messy, uneven information.

Business Areas:

A/B Testing Data and Analytics

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a subspace learning method, named ALPCAH, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace basis associated with the low-rank structure of the data. Our method makes no distributional assumptions of the low-rank component and does not assume that the noise variances are known. Further, this method uses a soft rank constraint that does not require subspace dimension to be known. Additionally, this paper develops a matrix factorized version of ALPCAH, named LR-ALPCAH, that is much faster and more memory efficient at the cost of requiring subspace dimension to be known or estimated. Simulations and real data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing algorithms. Code available at https://github.com/javiersc1/ALPCAH.

Stochastic Subspace via Probabilistic Principal Component Analysis for Characterizing Model Error

Computational Engineering, Finance, and Science

Makes computer models predict real-world behavior better.

28 Apr 2025 1

87%

Highly robust factored principal component analysis for matrix-valued outlier accommodation and explainable detection via matrix minimum covariance determinant

Methodology

Finds bad data points in complex pictures.

30 Sep 2025 0

87%

Few-Round Distributed Principal Component Analysis: Closing the Statistical Efficiency Gap by Consensus

Methodology

Improves computer analysis of huge amounts of data.

5 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

11 pages

ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data

Improves data analysis with messy, uneven information.

Technical Abstract

Stochastic Subspace via Probabilistic Principal Component Analysis for Characterizing Model Error

Highly robust factored principal component analysis for matrix-valued outlier accommodation and explainable detection via matrix minimum covariance determinant

Few-Round Distributed Principal Component Analysis: Closing the Statistical Efficiency Gap by Consensus