Learning from M-Tuple Dominant Positive and Unlabeled Data
By: Jiahe Qin , Junpeng Li , Changchun Hua and more
Potential Business Impact:
Teaches computers to guess what's inside when labels are fuzzy.
Label Proportion Learning (LLP) addresses the classification problem where multiple instances are grouped into bags and each bag contains information about the proportion of each class. However, in practical applications, obtaining precise supervisory information regarding the proportion of instances in a specific class is challenging. To better align with real-world application scenarios and effectively leverage the proportional constraints of instances within tuples, this paper proposes a generalized learning framework \emph{MDPU}. Specifically, we first mathematically model the distribution of instances within tuples of arbitrary size, under the constraint that the number of positive instances is no less than that of negative instances. Then we derive an unbiased risk estimator that satisfies risk consistency based on the empirical risk minimization (ERM) method. To mitigate the inevitable overfitting issue during training, a risk correction method is introduced, leading to the development of a corrected risk estimator. The generalization error bounds of the unbiased risk estimator theoretically demonstrate the consistency of the proposed method. Extensive experiments on multiple datasets and comparisons with other relevant baseline methods comprehensively validate the effectiveness of the proposed learning framework.
Similar Papers
Optimal Learning from Label Proportions with General Loss Functions
Machine Learning (CS)
Teaches computers to guess labels from group data.
Nearly Optimal Sample Complexity for Learning with Label Proportions
Machine Learning (CS)
Teaches computers from group data, not single examples.
Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning
Machine Learning (CS)
Helps computers learn from good and unknown examples.