arXiv:1807.07623 Abstract | arXiv Analytics

arXiv:1807.07623 [cs.LG]Abstract References Reviews Resources

An Optimal Algorithm for Stochastic and Adversarial Bandits

Published 2018-07-19Version 1

We provide an algorithm that achieves the optimal (up to constants) finite time regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The result provides a negative answer to the open problem of whether extra price has to be paid for the lack of information about the adversariality/stochasticity of the environment. We provide a complete characterization of online mirror descent algorithms based on Tsallis entropy and show that the power ${\alpha} = \frac{1}{2}$ achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). The algorithm also obtains adversarial and stochastic optimality in the utility-based dueling bandit setting.

Categories: cs.LG, stat.ML

Keywords: adversarial bandits, optimal algorithm, contaminated stochastic regime, online mirror descent algorithms, finite time regret

Related articles: Most relevant | Search more

arXiv:1910.06054 [cs.LG] (Published 2019-10-14)

An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays

Julian Zimmert, Yevgeny Seldin

arXiv:1702.06103 [cs.LG] (Published 2017-02-20)

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Yevgeny Seldin, Gábor Lugosi

arXiv:1802.06357 [cs.LG] (Published 2018-02-18)

Convergence of Online Mirror Descent Algorithms

Yunwen Lei, Ding-Xuan Zhou