Identifiability and improper solutions in the probabilistic partial least squares regression with unique variance
By: Takashi Arai
This paper addresses theoretical issues associated with probabilistic partial least squares (PLS) regression. As in the case of factor analysis, the probabilistic PLS regression with unique variance suffers from the issues of improper solutions and lack of identifiability, both of which causes difficulties in interpreting latent variables and model parameters. Using the fact that the probabilistic PLS regression can be viewed as a special case of factor analysis, we apply a norm constraint prescription on the factor loading matrix in the probabilistic PLS regression, which was recently proposed in the context of factor analysis to avoid improper solutions. Then, we prove that the probabilistic PLS regression with this norm constraint is identifiable. We apply the probabilistic PLS regression to data on amino acid mutations in Human Immunodeficiency Virus (HIV) protease to demonstrate the validity of the norm constraint and to confirm the identifiability numerically. Utilizing the proposed constraint enables the visualization of latent variables via a biplot. We also investigate the sampling distribution of the maximum likelihood estimates (MLE) using synthetically generated data. We numerically observe that MLE is consistent and asymptotically normally distributed.
Similar Papers
A PLS-Integrated LASSO Method with Application in Index Tracking
Machine Learning (Stat)
Makes predicting stock prices more accurate.
Extreme-PLS with missing data under weak dependence
Methodology
Finds important patterns in messy, incomplete data.
Identifiability and Inference for Generalized Latent Factor Models
Methodology
Finds hidden patterns in data for better understanding.