Sparse classification with positive-confidence data in high dimensions
By: The Tien Mai, Mai Anh Nguyen, Trung Nghia Nguyen
High-dimensional learning problems, where the number of features exceeds the sample size, often require sparse regularization for effective prediction and variable selection. While established for fully supervised data, these techniques remain underexplored in weak-supervision settings such as Positive-Confidence (Pconf) classification. Pconf learning utilizes only positive samples equipped with confidence scores, thereby avoiding the need for negative data. However, existing Pconf methods are ill-suited for high-dimensional regimes. This paper proposes a novel sparse-penalization framework for high-dimensional Pconf classification. We introduce estimators using convex (Lasso) and non-convex (SCAD, MCP) penalties to address shrinkage bias and improve feature recovery. Theoretically, we establish estimation and prediction error bounds for the L1-regularized Pconf estimator, proving it achieves near minimax-optimal sparse recovery rates under Restricted Strong Convexity condition. To solve the resulting composite objective, we develop an efficient proximal gradient algorithm. Extensive simulations demonstrate that our proposed methods achieve predictive performance and variable selection accuracy comparable to fully supervised approaches, effectively bridging the gap between weak supervision and high-dimensional statistics.
Similar Papers
Sparse learning with concave regularization: relaxation of the irrepresentable condition
Optimization and Control
Finds important info using fewer data points.
Cost-Sensitive Conformal Training with Provably Controllable Learning Bounds
Machine Learning (CS)
Makes AI predictions more accurate and reliable.
Sparse Activations as Conformal Predictors
Machine Learning (CS)
Makes AI guess better by showing possible answers.