Joint leave-group-out cross-validation in Bayesian spatial models
By: Alex Cooper, Aki Vehtari, Catherine Forbes
Potential Business Impact:
Finds better ways to test computer predictions.
Cross-validation (CV) is a widely-used method of predictive assessment based on repeated model fits to different subsets of the available data. CV is applicable in a wide range of statistical settings. However, in cases where data are not exchangeable, the design of CV schemes should account for suspected correlation structures within the data. CV scheme designs include the selection of left-out blocks and the choice of scoring function for evaluating predictive performance. This paper focuses on the impact of two scoring strategies for block-wise CV applied to spatial models with Gaussian covariance structures. We investigate, through several experiments, whether evaluating the predictive performance of blocks of left-out observations jointly, rather than aggregating individual (pointwise) predictions, improves model selection performance. Extending recent findings for data with serial correlation (such as time-series data), our experiments suggest that joint scoring reduces the variability of CV estimates, leading to more reliable model selection, particularly when spatial dependence is strong and model differences are subtle.
Similar Papers
The use of cross validation in the analysis of designed experiments
Applications
Helps computers pick the best way to understand data.
Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation
Statistics Theory
Improves computer guesses about data patterns.
Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation
Statistics Theory
Makes computer guessing of data more accurate.