A Unified and Stable Risk Minimization Framework for Weakly Supervised Learning with Theoretical Guarantees
By: Miao Zhang , Junpeng Li , Changchun Hua and more
Potential Business Impact:
Teaches computers with less information.
Weakly supervised learning has emerged as a practical alternative to fully supervised learning when complete and accurate labels are costly or infeasible to acquire. However, many existing methods are tailored to specific supervision patterns -- such as positive-unlabeled (PU), unlabeled-unlabeled (UU), complementary-label (CLL), partial-label (PLL), or similarity-unlabeled annotations -- and rely on post-hoc corrections to mitigate instability induced by indirect supervision. We propose a principled, unified framework that bypasses such post-hoc adjustments by directly formulating a stable surrogate risk grounded in the structure of weakly supervised data. The formulation naturally subsumes diverse settings -- including PU, UU, CLL, PLL, multi-class unlabeled, and tuple-based learning -- under a single optimization objective. We further establish a non-asymptotic generalization bound via Rademacher complexity that clarifies how supervision structure, model capacity, and sample size jointly govern performance. Beyond this, we analyze the effect of class-prior misspecification on the bound, deriving explicit terms that quantify its impact, and we study identifiability, giving sufficient conditions -- most notably via supervision stratification across groups -- under which the target risk is recoverable. Extensive experiments show consistent gains across class priors, dataset scales, and class counts -- without heuristic stabilization -- while exhibiting robustness to overfitting.
Similar Papers
Learning from Uncertain Similarity and Unlabeled Data
Machine Learning (CS)
Protects privacy while teaching computers to learn.
Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities
Machine Learning (Stat)
Makes computer guesses for data more trustworthy.
Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning
Machine Learning (CS)
Helps computers learn from good and unknown examples.