arXiv:1804.05929 Abstract | arXiv Analytics

arXiv:1804.05929 [cs.LG]Abstract References Reviews Resources

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

Fang Liu, Sinong Wang, Swapna Buccapatnam, Ness Shroff

Published 2018-04-16Version 1

In this work, we address the open problem of finding low-complexity near-optimal multi-armed bandit algorithms for sequential decision making problems. Existing bandit algorithms are either sub-optimal and computationally simple (e.g., UCB1) or optimal and computationally complex (e.g., kl-UCB). We propose a boosting approach to Upper Confidence Bound based algorithms for stochastic bandits, that we call UCBoost. Specifically, we propose two types of UCBoost algorithms. We show that UCBoost($D$) enjoys $O(1)$ complexity for each arm per round as well as regret guarantee that is $1/e$-close to that of the kl-UCB algorithm. We propose an approximation-based UCBoost algorithm, UCBoost($\epsilon$), that enjoys a regret guarantee $\epsilon$-close to that of kl-UCB as well as $O(\log(1/\epsilon))$ complexity for each arm per round. Hence, our algorithms provide practitioners a practical way to trade optimality with computational complexity. Finally, we present numerical results which show that UCBoost($\epsilon$) can achieve the same regret performance as the standard kl-UCB while incurring only $1\%$ of the computational cost of kl-UCB.

Comments: Accepted by IJCAI 2018

Categories: cs.LG, cs.AI, stat.ML

Keywords: stochastic bandits, boosting approach, tame complexity, optimality, low-complexity near-optimal multi-armed bandit algorithms

Related articles: Most relevant | Search more

arXiv:2006.10185 [cs.LG] (Published 2020-06-17)

Stochastic Bandits with Linear Constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

arXiv:1810.12188 [cs.LG] (Published 2018-10-29)

Adversarial Attacks on Stochastic Bandits

Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin Zhu

arXiv:1910.01161 [cs.LG] (Published 2019-10-02)

Stochastic Bandits with Delayed Composite Anonymous Feedback