Score: 0

Group Identification and Variable Selection in Multivariable Mendelian Randomization with Highly-Correlated Exposures

Published: November 15, 2025 | arXiv ID: 2511.12375v1

By: Yinxiang Wu, Neil M. Davies, Ting Ye

Potential Business Impact:

Finds groups of health risks causing heart disease.

Business Areas:
A/B Testing Data and Analytics

Multivariable Mendelian Randomization (MVMR) estimates the direct causal effects of multiple risk factors on an outcome using genetic variants as instruments. The growing availability of summary-level genetic data has created opportunities to apply MVMR in high-dimensional settings with many strongly correlated candidate risk factors. However, existing methods face three major limitations: weak instrument bias, limited interpretability, and the absence of valid post-selection inference. Here we introduce MVMR-PACS, a method that identifies signal-groups -- sets of causal risk factors with high genetic correlation or indistinguishable causal effects -- and estimates the direct effect of each group. MVMR-PACS minimizes a debiased objective function that reduces weak instrument bias while yielding interpretable estimates with theoretical guarantees for variable selection. We adapt a data-thinning strategy to summary-data MVMR to enable valid post-selection inference. In simulations, MVMR-PACS outperforms existing approaches in both estimation accuracy and variable selection. When applied to 27 lipoprotein subfraction traits and coronary artery disease risk, MVMR-PACS identifies biologically meaningful and robust signal-groups with interpretable direct causal effects.

Page Count
63 pages

Category
Statistics:
Methodology