Univariate-Guided Sparse Regression for Biobank-Scale High-Dimensional Omics Data
By: Joshua Richland , Tuomo Kiiskinen , William Wang and more
Potential Business Impact:
Finds genes that make people sick.
We present a scalable framework for computing polygenic risk scores (PRS) in high-dimensional genomic settings using the recently introduced Univariate-Guided Sparse Regression (uniLasso). UniLasso is a two-stage penalized regression procedure that leverages univariate coefficients and magnitudes to stabilize feature selection and enhance interpretability. Building on its theoretical and empirical advantages, we adapt uniLasso for application to the UK Biobank, a population-based repository comprising over one million genetic variants measured on hundreds of thousands of individuals from the United Kingdom. We further extend the framework to incorporate external summary statistics to increase predictive accuracy. Our results demonstrate that uniLasso attains predictive performance comparable to standard Lasso while selecting substantially fewer variants, yielding sparser and more interpretable models. Moreover, it exhibits superior performance in estimating PRS relative to its competitors, such as PRS-CS. Integrating external scores further improves prediction while maintaining sparsity.
Similar Papers
Univariate-Guided Sparse Regression for Biobank-Scale High-Dimensional -omics Data
Methodology
Finds genetic links to diseases more accurately.
Univariate-Guided Sparse Regression for Biobank-Scale High-Dimensional -omics Data
Methodology
Finds disease risks from your genes better.
Penalized Linear Models for Highly Correlated High-Dimensional Immunophenotyping Data
Applications
Finds hidden health clues in complex body data.