Relaxing the Assumption of Strongly Non-Informative Linkage Error in Secondary Regression Analysis of Linked Files
By: Priyanjali Bukke, Martin Slawski
Potential Business Impact:
Fixes mistakes when combining different data.
Data analysis of files that are a result of linking records from multiple sources are often affected by linkage errors. Records may be linked incorrectly, or their links may be missed. In consequence, it is essential that such errors are taken into account to ensure valid post-linkage inference. Here, we propose an extension to a general framework for regression with linked covariates and responses based on a two-component mixture model, which was developed in prior work. This framework addresses the challenging case of secondary analysis in which only the linked data is available and information about the record linkage process is limited. The extension considered herein relaxes the assumption of strongly non-informative linkage in the framework according to which linkage does not depend on the covariates used in the analysis, which may be limiting in practice. The effectiveness of the proposed extension is investigated by simulations and a case study.
Similar Papers
Regression Analysis After Bipartite Bayesian Record Linkage
Methodology
Better links improve study results.
On linkage bias-correction for estimators using iterated bootstraps
Methodology
Fixes errors in combined data for better results.
Assumption-lean Inference for Network-linked Data
Methodology
Helps understand how people connect online.