arXiv:1202.4473 Abstract | arXiv Analytics

arXiv:1202.4473 [cs.LG]Abstract References Reviews Resources

The best of both worlds: stochastic and adversarial bandits

Published 2012-02-20Version 1

We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the square-root worst-case regret of Exp3 (Auer et al., SIAM J. on Computing 2002) and the (poly)logarithmic regret of UCB1 (Auer et al., Machine Learning 2002) for stochastic rewards. Adversarial rewards and stochastic rewards are the two main settings in the literature on (non-Bayesian) multi-armed bandits. Prior work on multi-armed bandits treats them separately, and does not attempt to jointly optimize for both. Our result falls into a general theme of achieving good worst-case performance while also taking advantage of "nice" problem instances, an important issue in the design of algorithms with partially known inputs.

Categories: cs.LG, cs.DS

Keywords: adversarial bandits, stochastic rewards, square-root worst-case regret, adversarial rewards, adversarial optimal

Related articles: Most relevant | Search more

arXiv:1702.06103 [cs.LG] (Published 2017-02-20)

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Yevgeny Seldin, Gábor Lugosi

arXiv:1807.07623 [cs.LG] (Published 2018-07-19)

An Optimal Algorithm for Stochastic and Adversarial Bandits

Julian Zimmert, Yevgeny Seldin

arXiv:1811.12253 [cs.LG] (Published 2018-10-23)

Unifying the stochastic and the adversarial Bandits with Knapsack

Anshuka Rangi, Massimo Franceschetti, Long Tran-Thanh