Bayesian Variable Selection in Multivariate Regression Under Collinearity in the Design Matrix
By: Joyee Ghosh, Xun Li
Potential Business Impact:
Improves computer predictions when data is tricky.
We consider the problem of variable selection in Bayesian multivariate linear regression models, involving multiple response and predictor variables, under multivariate normal errors. In the absence of a known covariance structure, specifying a model with a non-diagonal covariance matrix is appealing. Modeling dependency in the random errors through a non-diagonal covariance matrix is generally expected to lead to improved estimation of the regression coefficients. In this article, we highlight an interesting exception: modeling the dependency in errors can significantly worsen both estimation and prediction. We demonstrate that Bayesian multi-outcome regression models using several popular variable selection priors can suffer from poor estimation properties in low-information settings--such as scenarios with weak signals, high correlation among predictors and responses, and small sample sizes. In such cases, the simultaneous estimation of all unknown parameters in the model becomes difficult when using a non-diagonal covariance matrix. Through simulation studies and a dataset with measurements from NIR spectroscopy, we illustrate that a two-step procedure--estimating the mean and the covariance matrix separately--can provide more accurate estimates in such cases. Thus, a potential solution to avoid the problem altogether is to routinely perform an additional analysis with a diagonal covariance matrix, even if the errors are expected to be correlated.
Similar Papers
Scalable Bayesian inference on high-dimensional multivariate linear regression
Methodology
Finds hidden patterns in complex data.
Testing-driven Variable Selection in Bayesian Modal Regression
Methodology
Finds important clues in messy data.
Total Robustness in Bayesian Nonlinear Regression for Measurement Error Problems under Model Misspecification
Methodology
Makes computer predictions more accurate with messy data.