Sequential Hierarchical Regression Imputation with Variable Selection Routines
By: Qiushuang Li, Recai Yucel
Potential Business Impact:
Speeds up health data analysis by removing extra info.
We aim to incorporate variable selection routines into variable-by-variable (or sequential) imputation in clustered data to achieve computational improvement in applications with large-scale health data. Specifically, we utilize variable selection routines using spike-and-slab priors within the Bayesian variable selection routine. The choice of these priors allows us to ``force'' variables of importance (e.g., design variables or variables known to play a role in the missingness mechanism) into the imputation models based on a class of mixed-effects models. Our ultimate goal is to improve computational speed by removing unnecessary variables. We employ Markov chain Monte Carlo techniques to sample from the implied posterior distributions for model unknowns as well as missing data. We assess the performance of our proposed methodology via simulation studies. Our results show that our proposed algorithms lead to satisfactory estimates and, in some instances, outperform some of the existing methods that are available to practitioners. We illustrate our methods using a national survey of children's health.
Similar Papers
Variational Bayesian Multiple Imputation in High-Dimensional Regression Models With Missing Responses
Methodology
Fixes messy data for better computer guesses.
Variational Inference for Fully Bayesian Hierarchical Linear Models
Methodology
Speeds up data analysis, but can be less accurate.
Bayesian Models for Joint Selection of Features and Auto-Regressive Lags: Theory and Applications in Environmental and Financial Forecasting
Methodology
Finds important past events for better predictions.