arXiv:1905.08165 [stat.ML]AbstractReferencesReviewsResources
Gradient Ascent for Active Exploration in Bandit Problems
Published 2019-05-20Version 1
We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.
Comments: 21 pages, 1 figure
Related articles: Most relevant | Search more
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models
arXiv:2301.03785 [stat.ML] (Published 2023-01-10)
Best Arm Identification in Stochastic Bandits: Beyond $β-$optimality
arXiv:2308.12000 [stat.ML] (Published 2023-08-23)
On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget