arXiv:2211.06883 Abstract | arXiv Analytics

arXiv:2211.06883 [cs.LG]Abstract References Reviews Resources

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane

Published 2022-11-13Version 1

We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward over multiple rounds rather than the entire reward for the arm all at once. In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property. Such a generalization is needed to be able to handle partitioned rewards in which the maximum reward per round is not distributed uniformly across rounds. We derive a lower bound on the TP-MAB problem under the assumption that Beta-spread holds. Moreover, we provide an algorithm TP-UCB-FR-G, which uses the Beta-spread property to improve the regret upper bound in some scenarios. By generalizing how the cumulative reward is distributed, this setting is applicable in a broader range of applications.

Categories: cs.LG, stat.ML

Keywords: multi-armed bandit, temporally-partitioned rewards, generalizing distribution, partial rewards, beta-spread property

Related articles: Most relevant | Search more

arXiv:2307.07264 [cs.LG] (Published 2023-07-14)

On Interpolating Experts and Multi-Armed Bandits

Houshuang Chen, Yuchen He, Chihao Zhang

arXiv:2112.06517 [cs.LG] (Published 2021-12-13, updated 2022-03-23)

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

arXiv:1911.05142 [cs.LG] (Published 2019-11-12)

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen