Score: 0

Principal Component Analysis When n < p: Challenges and Solutions

Published: March 21, 2025 | arXiv ID: 2503.17560v1

By: Nuwan Weeraratne, Lyn Hunt, Jason Kurz

Potential Business Impact:

Makes computer analysis better with messy, complex data.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Principal Component Analysis is a key technique for reducing the complexity of high-dimensional data while preserving its fundamental data structure, ensuring models remain stable and interpretable. This is achieved by transforming the original variables into a new set of uncorrelated variables (principal components) based on the covariance structure of the original variables. However, since the traditional maximum likelihood covariance estimator does not accurately converge to the true covariance matrix, the standard principal component analysis performs poorly as a dimensionality reduction technique in high-dimensional scenarios $n<p$. In this study, inspired by a fundamental issue associated with mean estimation when $n<p$, we proposed a novel estimation called pairwise differences covariance estimation with four regularized versions of it to address the issues with the principal component analysis when n < p high dimensional data settings. In empirical comparisons with existing methods (maximum likelihood estimation and its best alternative method called Ledoit-Wolf estimation) and the proposed method(s), all the proposed regularized versions of pairwise differences covariance estimation perform well compared to those well-known estimators in estimating the covariance and principal components while minimizing the PCs' overdispersion and cosine similarity error. Real data applications are presented.