Blockwise Missingness meets AI: A Tractable Solution for Semiparametric Inference
By: Qi Xu , Lorenzo Testa , Jing Lei and more
Potential Business Impact:
Fixes data with missing parts using smart guessing.
We consider parameter estimation and inference when data feature blockwise, non-monotone missingness. Our approach, rooted in semiparametric theory and inspired by prediction-powered inference, leverages off-the-shelf AI (predictive or generative) models to handle missing completely at random mechanisms, by finding an approximation of the optimal estimating equation through a novel and tractable Restricted Anova hierarchY (RAY) approximation. The resulting Inference for Blockwise Missingness(RAY), or IBM(RAY) estimator incorporates pre-trained AI models and carefully controls asymptotic variance by tuning model-specific hyperparameters. We then extend IBM(RAY) to a general class of estimators. We find the most efficient estimator in this class, which we call IBM(Adaptive), by solving a constrained quadratic programming problem. All IBM estimators are unbiased, and, crucially, asymptotically achieving guaranteed efficiency gains over a naive complete-case estimator, regardless of the predictive accuracy of the AI models used. We demonstrate the finite-sample performance and numerical stability of our method through simulation studies and an application to surface protein abundance estimation.
Similar Papers
Imputation-Powered Inference
Methodology
Fixes computer data that's partly missing.
Efficient Semiparametric Inference for Distributed Data with Blockwise Missingness
Methodology
Shares data safely to improve computer learning.
Robust Semiparametric Inference for Bayesian Additive Regression Trees
Methodology
Fixes computer predictions when some information is missing.