Score: 1

Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random

Published: September 29, 2025 | arXiv ID: 2509.25009v1

By: Lorenzo Testa, Edward H. Kennedy, Matthew Reimherr

BigTech Affiliations: Amazon

Potential Business Impact:

Fixes studies when some information is missing.

Business Areas:
A/B Testing Data and Analytics

The Difference-in-Differences (DiD) method is a fundamental tool for causal inference, yet its application is often complicated by missing data. Although recent work has developed robust DiD estimators for complex settings like staggered treatment adoption, these methods typically assume complete data and fail to address the critical challenge of outcomes that are missing at random (MAR) -- a common problem that invalidates standard estimators. We develop a rigorous framework, rooted in semiparametric theory, for identifying and efficiently estimating the Average Treatment Effect on the Treated (ATT) when either pre- or post-treatment (or both) outcomes are missing at random. We first establish nonparametric identification of the ATT under two minimal sets of sufficient conditions. For each, we derive the semiparametric efficiency bound, which provides a formal benchmark for asymptotic optimality. We then propose novel estimators that are asymptotically efficient, achieving this theoretical bound. A key feature of our estimators is their multiple robustness, which ensures consistency even if some nuisance function models are misspecified. We validate the properties of our estimators and showcase their broad applicability through an extensive simulation study.

Country of Origin
🇺🇸 United States

Page Count
20 pages

Category
Statistics:
Methodology