Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning
By: Miao Zhang , Junpeng Li , Changchun Hua and more
Potential Business Impact:
Helps computers learn from good and unknown examples.
Positive--Unlabeled (PU) learning considers settings in which only positive and unlabeled data are available, while negatives are missing or left unlabeled. This situation is common in real applications where annotating reliable negatives is difficult or costly. Despite substantial progress in PU learning, the multi-class case (MPU) remains challenging: many existing approaches do not ensure \emph{unbiased risk estimation}, which limits performance and stability. We propose a cost-sensitive multi-class PU method based on \emph{adaptive loss weighting}. Within the empirical risk minimization framework, we assign distinct, data-dependent weights to the positive and \emph{inferred-negative} (from the unlabeled mixture) loss components so that the resulting empirical objective is an unbiased estimator of the target risk. We formalize the MPU data-generating process and establish a generalization error bound for the proposed estimator. Extensive experiments on \textbf{eight} public datasets, spanning varying class priors and numbers of classes, show consistent gains over strong baselines in both accuracy and stability.
Similar Papers
Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
Machine Learning (Stat)
Finds bad guys using less information.
Adaptive Pseudo Label Selection for Individual Unlabeled Data by Positive and Unlabeled Learning
CV and Pattern Recognition
Helps doctors find sickness in X-rays better.
Constraint Multi-class Positive and Unlabeled Learning for Distantly Supervised Named Entity Recognition
Computation and Language
Helps computers find important words in text.