arXiv:2410.15003 Abstract | arXiv Analytics

arXiv:2410.15003 [math.OC]Abstract References Reviews Resources

Achieving O(1/N) Optimality Gap in Restless Bandits through Diffusion Approximation

Published 2024-10-19Version 1

We study the finite horizon Restless Multi-Armed Bandit (RMAB) problem with $N$ homogeneous arms, focusing on the challenges posed by degenerate RMABs, which are prevalent in practical applications. While previous work has shown that Linear Programming (LP)-based policies achieve exponentially fast convergence relative to the LP upper bound in non-degenerate models, applying these LP-based policies to degenerate RMABs results in slower convergence rates of $O(1/\sqrt{N})$. We construct a diffusion system that incorporates both the mean and variance of the stochastic processes, in contrast to the fluid system from the LP, which only accounts for the mean, thereby providing a more accurate representation of RMAB dynamics. Consequently, our novel diffusion-resolving policy achieves an optimality gap of $O(1/N)$ relative to the true optimal value, rather than the LP upper bound, revealing that the fluid approximation and the LP upper bound are too loose in degenerate settings. These insights pave the way for constructing policies that surpass the $O(1/\sqrt{N})$ optimality gap for any RMAB, whether degenerate or not.

Comments: 31 pages, 6 figures

Categories: math.OC, cs.LG, math.PR

Subjects: 90C15, 90C25, 90C31, 90B15, 90B05

Keywords: optimality gap, lp upper bound, diffusion approximation, restless bandits, horizon restless multi-armed bandit

Related articles: Most relevant | Search more

arXiv:1804.02100 [math.OC] (Published 2018-04-06, updated 2019-07-23)

Restless Bandits in Action: Resource Allocation, Competition and Reservation

Jing Fu, Bill Moran, Peter G. Taylor

arXiv:1102.3508 [math.OC] (Published 2011-02-17)

Online Learning of Rested and Restless Bandits

Cem Tekin, Mingyan Liu

arXiv:1610.00399 [math.OC] (Published 2016-10-03)