Data Integration With Biased Summary Data via Generalized Entropy Balancing
By: Kosuke Morikawa, Sho Komukai, Satoshi Hattori
Potential Business Impact:
Helps studies use outside info without errors.
Statistical methods for integrating individual-level data with external summary data have attracted attention because of their potential to reduce data collection costs. Summary data are often accessible through public sources and relatively easy to obtain, making them a practical resource for enhancing the precision of statistical estimation. Typically, these methods assume that internal and external data originate from the same underlying distribution. However, when this assumption is violated, incorporating external data introduces the risk of bias, primarily due to differences in background distributions between the current study and the external source. In practical applications, the primary interest often lies not in statistical quantities related specifically to the external data distribution itself, but in the individual-level internal data. In this paper, we propose a methodology based on generalized entropy balancing, designed to integrate external summary data even if derived from biased samples. Our method demonstrates double robustness, providing enhanced protection against model misspecification. Importantly, the applicability of our method can be assessed directly from the available data. We illustrate the versatility and effectiveness of the proposed estimator through an analysis of Nationwide Public-Access Defibrillation data in Japan.
Similar Papers
Borrowing Information from an Unidentifiable Model: Guaranteed Efficiency Gain with a Dichotomized Outcome in the External Data
Methodology
Combines different data to get better answers.
To BEE or not to BEE: Estimating more than Entropy with Biased Entropy Estimators
Information Theory
Finds better ways to measure information.
A General Approach for Calibration Weighting under Missing at Random
Methodology
Fixes broken data for better computer answers.