arXiv:2112.06517 Abstract | arXiv Analytics

arXiv:2112.06517 [cs.LG]Abstract References Reviews Resources

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

Published 2021-12-13, updated 2022-03-23Version 3

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds. Under the assumption that at each round the true reward of each arm is drawn from a fixed distribution, we derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated. First, we show a $\widetilde{O}(T^{2/3})$ regret in the general case when the observation functions are a genearalized linear function of the true rewards. On the other hand, we show that an improved $\widetilde{O}(\sqrt{T})$ regret can be derived when the observation functions are noisy linear functions of the true rewards. Finally, we report an empirical validation that confirms our theoretical findings, provides a thorough comparison to alternative approaches, and further supports the interest of this setting in practice.

Categories: cs.LG, stat.ML

Keywords: multi-armed bandit, noisy evaluations, true reward, learner receives noisy independent, observation functions

Related articles: Most relevant | Search more

arXiv:2211.06883 [cs.LG] (Published 2022-11-13)

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane

arXiv:1911.05142 [cs.LG] (Published 2019-11-12)

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen

arXiv:1911.09458 [cs.LG] (Published 2019-11-21)

Observe Before Play: Multi-armed Bandit with Pre-observations

Jinhang Zuo, Xiaoxi Zhang, Carlee Joe-Wong