High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations
By: Victor Léger, Florent Chatelain
Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical success, a precise theoretical understanding of its behavior in high-dimensional regimes remains limited. In this paper, we study a data integration model in which two high-dimensional data matrices share a low-rank common latent structure while also containing individual-specific components. We analyze the singular vectors of the associated cross-covariance matrix using tools from random matrix theory and derive asymptotic characterizations of the alignment between estimated and true latent directions. These results provide a quantitative explanation of the reconstruction performance of the PLS variant based on Singular Value Decomposition (PLS-SVD) and identify regimes where the method exhibits counter-intuitive or limiting behavior. Building on this analysis, we compare PLS-SVD with principal component analysis applied separately to each dataset and show its asymptotic superiority in detecting the common latent subspace. Overall, our results offer a comprehensive theoretical understanding of high-dimensional PLS-SVD, clarifying both its advantages and fundamental limitations.
Similar Papers
A PLS-Integrated LASSO Method with Application in Index Tracking
Machine Learning (Stat)
Makes predicting stock prices more accurate.
Spectral Thresholds in Correlated Spiked Models and Fundamental Limits of Partial Least Squares
Statistics Theory
Finds hidden connections in messy, big data.
Identifiability and improper solutions in the probabilistic partial least squares regression with unique variance
Methodology
Makes computer models of diseases easier to understand.