Scalable Ultra-High-Dimensional Quantile Regression with Genomic Applications
By: Hanqing Wu, Jonas Wallin, Iuliana Ionita-Laza
Potential Business Impact:
Helps find patterns in huge, messy data.
Modern datasets arising from social media, genomics, and biomedical informatics are often heterogeneous and (ultra) high-dimensional, creating substantial challenges for conventional modeling techniques. Quantile regression (QR) not only offers a flexible way to capture heterogeneous effects across the conditional distribution of an outcome, but also naturally produces prediction intervals that help quantify uncertainty in future predictions. However, classical QR methods can face serious memory and computational constraints in large-scale settings. These limitations motivate the use of parallel computing to maintain tractability. While extensive work has examined sample-splitting strategies in settings where the number of observations $n$ greatly exceeds the number of features $p$, the equally important (ultra) high-dimensional regime ($p >> n$) has been comparatively underexplored. To address this gap, we introduce a feature-splitting proximal point algorithm, FS-QRPPA, for penalized QR in high-dimensional regime. Leveraging recent developments in variational analysis, we establish a Q-linear convergence rate for FS-QRPPA and demonstrate its superior scalability in large-scale genomic applications from the UK Biobank relative to existing methods. Moreover, FS-QRPPA yields more accurate coefficient estimates and better coverage for prediction intervals than current approaches. We provide a parallel implementation in the R package fsQRPPA, making penalized QR tractable on large-scale datasets.
Similar Papers
Factor Augmented Quantile Regression Model
Methodology
Helps computers understand complicated data better.
High-Dimensional Precision Matrix Quadratic Forms: Estimation Framework for $p > n$
Methodology
Helps understand complex data when there's more info than examples.
StaRQR-K: False Discovery Rate Controlled Regional Quantile Regression
Methodology
Finds gene changes affecting cancer growth.