arXiv:1806.10745 Abstract | arXiv Analytics

arXiv:1806.10745 [cs.LG]Abstract References Reviews Resources

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

Published 2018-06-28Version 1

We introduce a new family of margin-based regret guarantees for adversarial contextual bandit learning. Our results are based on multiclass surrogate losses. Using the ramp loss, we derive a universal margin-based regret bound in terms of the sequential metric entropy for a benchmark class of real-valued regression functions. The new margin bound serves as a complete contextual bandit analogue of the classical margin bound from statistical learning. The result applies to large nonparametric classes, improving on the best known results for Lipschitz contextual bandits (Cesa-Bianchi et al., 2017) and, as a special case, generalizes the dimension-independent Banditron regret bound (Kakade et al., 2008) to arbitrary linear classes with smooth norms. On the algorithmic side, we use the hinge loss to derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regression functions. This provides the first hinge loss-based solution to the open problem of Abernethy and Rakhlin (2009). With an additional i.i.d. assumption we give a simple oracle-efficient algorithm whose regret matches our generic metric entropy-based bound for sufficiently complex nonparametric classes. Under realizability assumptions our results also yield classical regret bounds.

Categories: cs.LG, stat.ML

Keywords: efficient algorithm, surrogate losses, regression functions, dimension-independent banditron regret bound, complete contextual bandit analogue

Related articles: Most relevant | Search more

arXiv:2010.01818 [cs.LG] (Published 2020-10-05)

An Efficient Algorithm for Cooperative Semi-Bandits

Riccardo Della Vecchia, Tommaso Cesari

arXiv:1410.6414 [cs.LG] (Published 2014-10-22)

A Parallel and Efficient Algorithm for Learning to Match

Jingbo Shang, Tianqi Chen, Hang Li, Zhengdong Lu, Yong Yu

arXiv:1708.04781 [cs.LG] (Published 2017-08-16)

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

Yichi Zhou, Jun Zhu, Jingwei Zhuo