Score: 3

Optimal Learning from Label Proportions with General Loss Functions

Published: September 18, 2025 | arXiv ID: 2509.15145v1

By: Lorne Applebaum , Travis Dick , Claudio Gentile and more

BigTech Affiliations: Google

Potential Business Impact:

Teaches computers to guess labels from group data.

Business Areas:

Personalization Commerce and Shopping

Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). In this partially-supervised setting, training data consists of groups of examples, termed bags, for which we only observe the average label value. The main goal, however, remains the design of a predictor for the labels of individual examples. We introduce a novel and versatile low-variance de-biasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we substantially improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.