arXiv:1807.03558 Abstract | arXiv Analytics

arXiv:1807.03558 [cs.LG]Abstract References Reviews Resources

Bandits with Side Observations: Bounded vs. Logarithmic Regret

Rémy Degenne, Evrard Garcelon, Vianney Perchet

Published 2018-07-10Version 1

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free. We prove that, no matter how small $\epsilon$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/\epsilon)}{\Delta_i}$, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

Comments: Conference on Uncertainty in Artificial Intelligence (UAI) 2018, 21 pages

Categories: cs.LG, stat.ML

Keywords: logarithmic regret, side observations, extra observation, classical stochastic multi-armed bandit, regret smaller

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:2006.12038 [cs.LG] (Published 2020-06-22)

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Kumar Ashutosh, Jayakrishnan Nair, Anmol Kagrecha, Krishna Jagannathan

arXiv:1805.07430 [cs.LG] (Published 2018-05-18)

Efficient Online Portfolio with Logarithmic Regret

Haipeng Luo, Chen-Yu Wei, Kai Zheng

arXiv:2305.19691 [cs.LG] (Published 2023-05-31)