arXiv Analytics

Sign in

arXiv:1905.08165 [stat.ML]AbstractReferencesReviewsResources

Gradient Ascent for Active Exploration in Bandit Problems

Pierre Ménard

Published 2019-05-20Version 1

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

Related articles: Most relevant | Search more
arXiv:1407.4443 [stat.ML] (Published 2014-07-16, updated 2016-11-14)
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models
arXiv:2301.03785 [stat.ML] (Published 2023-01-10)
Best Arm Identification in Stochastic Bandits: Beyond $β-$optimality
arXiv:2308.12000 [stat.ML] (Published 2023-08-23)
On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget