Model-free identification in ill-posed regression
By: Gianluca Finocchio, Tatyana Krivobokova
Potential Business Impact:
Finds the most important patterns in messy data.
The problem of parsimonious parameter identification in possibly high-dimensional linear regression with highly correlated features is addressed. This problem is formalized as the estimation of the best, in a certain sense, linear combinations of the features that are relevant to the response variable. Importantly, the dependence between the features and the response is allowed to be arbitrary. Necessary and sufficient conditions for such parsimonious identification -- referred to as statistical interpretability -- are established for a broad class of linear dimensionality reduction algorithms. Sharp bounds on their estimation errors, with high probability, are derived. To our knowledge, this is the first formal framework that enables the definition and assessment of the interpretability of a broad class of algorithms. The results are specifically applied to methods based on sparse regression, unsupervised projection and sufficient reduction. The implications of employing such methods for prediction problems are discussed in the context of the prolific literature on overparametrized methods in the regime of benign overfitting.
Similar Papers
Identifiability and Estimation in High-Dimensional Nonparametric Latent Structure Models
Statistics Theory
Find hidden patterns in complex data better.
Nonparametric Factor Analysis and Beyond
Machine Learning (CS)
Finds hidden causes even with messy data.
Deep Learning for Subspace Regression
Machine Learning (CS)
Teaches computers to guess answers for complex problems.