arXiv:1702.06103 [cs.LG]AbstractReferencesReviewsResources
An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits
Published 2017-02-20Version 1
We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and replaces an additive factor of order $\Delta e^{1/\Delta^2}$ by an additive factor of order $1/\Delta^7$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
Related articles: Most relevant | Search more
arXiv:1704.04470 [cs.LG] (Published 2017-04-14)
Lean From Thy Neighbor: Stochastic & Adversarial Bandits in a Network
arXiv:1807.07623 [cs.LG] (Published 2018-07-19)
An Optimal Algorithm for Stochastic and Adversarial Bandits
arXiv:1910.06054 [cs.LG] (Published 2019-10-14)
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays