Optimal Learning from Label Proportions with General Loss Functions
By: Lorne Applebaum , Travis Dick , Claudio Gentile and more
Potential Business Impact:
Teaches computers to guess labels from group data.
Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). In this partially-supervised setting, training data consists of groups of examples, termed bags, for which we only observe the average label value. The main goal, however, remains the design of a predictor for the labels of individual examples. We introduce a novel and versatile low-variance de-biasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we substantially improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.
Similar Papers
Nearly Optimal Sample Complexity for Learning with Label Proportions
Machine Learning (CS)
Teaches computers from group data, not single examples.
Learning from M-Tuple Dominant Positive and Unlabeled Data
Machine Learning (CS)
Teaches computers to guess what's inside when labels are fuzzy.
Sample-Efficient Omniprediction for Proper Losses
Machine Learning (CS)
Helps computers make better choices for everyone.