Robust reduced rank regression under heavy-tailed noise and missing data via non-convex penalization
By: The Tien Mai
Reduced rank regression (RRR) is a fundamental tool for modeling multiple responses through low-dimensional latent structures, offering both interpretability and strong predictive performance in high-dimensional settings. Classical RRR methods, however, typically rely on squared loss and Gaussian noise assumptions, rendering them sensitive to heavy-tailed errors, outliers, and data contamination. Moreover, the presence of missing data--common in modern applications--further complicates reliable low-rank estimation. In this paper, we propose a robust reduced rank regression framework that simultaneously addresses heavy-tailed noise, outliers, and missing data. Our approach combines a robust Huber loss with nonconvex spectral regularization, specifically the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD). Unlike convex nuclear-norm regularization, the proposed nonconvex penalties alleviate excessive shrinkage and enable more accurate recovery of the underlying low-rank structure. The method also accommodates missing data in the response matrix without requiring imputation. We develop an efficient proximal gradient algorithm based on alternating updates and tailored spectral thresholding. Extensive simulation studies demonstrate that the proposed methods substantially outperform nuclear-norm-based and non-robust alternatives under heavy-tailed noise and contamination. An application to cancer cell line data set further illustrates the practical advantages of the proposed robust RRR framework. Our method is implemented in the R package rrpackrobust available at https://github.com/tienmt/rrpackrobust.
Similar Papers
Higher Order Reduced Rank Regression
Machine Learning (Stat)
Finds hidden patterns in complex data.
Bayesian Markov-Switching Partial Reduced-Rank Regression
Methodology
Finds hidden patterns in changing data.
Regularized Reduced Rank Regression for mixed predictor and response variables
Methodology
Finds important patterns in messy, big data.