Estimation of Semiparametric Factor Models with Missing Data
By: Sijie Zheng
Potential Business Impact:
Fixes broken data for better predictions.
We study semiparametric factor models in high-dimensional panels where the factor loadings consist of a nonparametric component explained by observed covariates and an idiosyncratic component capturing unobserved heterogeneity. A key challenge in empirical applications is the presence of missing observations, which can distort both factor recovery and loading estimation. To address this issue, we develop a projected principal component analysis (PPCA) procedure that accommodates general missing-at-random mechanisms through inverse-probability weighting. We establish consistency and derive the asymptotic distributions of the estimated factors and loading functions, allowing the sieve dimension to diverge and permitting the time dimension to be either fixed or growing. Unlike classical PCA, PPCA achieves consistent factor estimation even when T is fixed, and the limiting distributions under missing data exhibit mixture normality with enlarged asymptotic variances. Theoretical results are supported by simulations and an empirical application. Our findings demonstrate that PPCA provides an effective and robust framework for estimating semiparametric factor models in the presence of missing data.
Similar Papers
Large-dimensional Factor Analysis with Weighted PCA
Methodology
Improves computer analysis of complex data.
Probabilistic PCA on tensors
Statistics Theory
Finds patterns in many connected data points.
Estimating the true number of principal components under the random design
Econometrics
Finds the best way to simplify complex data.