Statistical Modeling of Combinatorial Response Data
By: Yu Zheng, Malay Ghosh, Leo Duan
Potential Business Impact:
Helps understand complex choices with many rules.
There is a rich literature for modeling binary and polychotomous responses. However, existing methods are inadequate for handling combinatorial responses, where each response is an integer array under additional constraints. Such data are increasingly common in modern applications, such as surveys collected under skip logic, event propagation on a network, and observed matching in ecology. Ignoring the combinatorial structure leads to biased estimation and prediction. The fundamental challenge is the lack of a link function that connects a linear or functional predictor with a probability respecting the combinatorial constraints. In this article, we propose a novel augmented likelihood that views combinatorial response as a deterministic transform of a continuous latent variable. We specify the transform as the maximizer of integer linear program, and characterize useful properties such as dual thresholding representation. When taking a Bayesian approach and considering a multivariate normal distribution for the latent variable, our method becomes a direct generalization to the celebrated probit data augmentation, and enjoys straightforward computation via Gibbs sampler. We provide theoretical justification, including consistency and applicability, at an interesting intersection between duality and probability. We demonstrate the effectiveness of our method through simulations and a data application on the seasonal matching between waterfowl.
Similar Papers
Bayesian nonparametric modeling of mixed-type bounded data
Methodology
Helps understand mixed health data better.
Maximum Likelihood for Logistic Regression Model with Incomplete and Hybrid-Type Covariates
Methodology
Fixes computer math when some numbers are missing.
Robust Spatio-Temporal Distributional Regression
Methodology
Helps guess hidden sizes from limited measurements.