Semiparametric inference for inequality measures under nonignorable nonresponse using callback data
By: Xinyu Wang , Chunlin Wang , Tao Yu and more
This paper develops semiparametric methods for estimation and inference of widely used inequality measures when survey data are subject to nonignorable nonresponse, a challenging setting in which response probabilities depend on the unobserved outcomes. Such nonresponse mechanisms are common in household surveys and invalidate standard inference procedures due to selection bias and lack of population representativeness. We address this problem by exploiting callback data from repeated contact attempts and adopting a semiparametric model that leaves the outcome distribution unspecified. We construct semiparametric full-likelihood estimators for the underlying distribution and the associated inequality measures, and establish their large-sample properties for a broad class of functionals, including quantiles, the Theil index, and the Gini index. Explicit asymptotic variance expressions are derived, enabling valid Wald-type inference under nonignorable nonresponse. To facilitate implementation, we propose a stable and computationally convenient expectation-maximization algorithm, whose steps either admit closed-form expressions or reduce to fitting a standard logistic regression model. Simulation studies demonstrate that the proposed procedures effectively correct nonresponse bias and achieve near-benchmark efficiency. An application to Consumer Expenditure Survey data illustrates the practical gains from incorporating callback information when making inference on inequality measures.
Similar Papers
Semiparametric Causal Inference for Right-Censored Outcomes with Many Weak Invalid Instruments
Methodology
Finds true causes even with missing data.
Semiparametric Causal Inference for Right-Censored Outcomes with Many Weak Invalid Instruments
Methodology
Finds real causes even with missing data.
Bounds on inequality with incomplete data
Econometrics
Measures wealth gaps even with messy data.