arXiv:2505.05355 Abstract | arXiv Analytics

arXiv:2505.05355 [cs.LG]Abstract References Reviews Resources

Nearly Optimal Sample Complexity for Learning with Label Proportions

Robert Busa-Fekete, Travis Dick, Claudio Gentile, Haim Kaplan, Tomer Koren, Uri Stemmer

Published 2025-05-08Version 1

We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines.

Categories: cs.LG

Keywords: optimal sample complexity, label proportions, ad hoc variance reduction techniques, stochastic gradient descent algorithms, aggregate label values

Related articles: Most relevant | Search more

arXiv:2310.11707 [cs.LG] (Published 2023-10-18)

Learning under Label Proportions for Text Classification

Jatin Chauhan, Xiaoxuan Wang, Wei Wang

arXiv:1810.10328 [cs.LG] (Published 2018-10-24)

Label Propagation for Learning with Label Proportions

Rafael Poyiadzi, Raul Santos-Rodriguez, Niall Twomey

arXiv:2004.03515 [cs.LG] (Published 2020-04-07)

On the Complexity of Learning from Label Proportions

Benjamin Fish, Lev Reyzin