arXiv:2406.16802 Abstract | arXiv Analytics

arXiv:2406.16802 [cs.LG]Abstract References Reviews Resources

Improved Regret Bounds for Bandits with Expert Advice

Nicolò Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya

Published 2024-06-24Version 1

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Categories: cs.LG, stat.ML

Keywords: regret bounds, lower bound, expert advice problem, standard feedback model, restricted feedback model

Related articles: Most relevant | Search more

arXiv:2106.11692 [cs.LG] (Published 2021-06-22)

A Unified Framework for Conservative Exploration

Yunchang Yang et al.

arXiv:1806.02970 [cs.LG] (Published 2018-06-08)

PAC Ranking from Pairwise and Listwise Queries: Lower Bounds and Upper Bounds

Wenbo Ren, Jia Liu, Ness B. Shroff

arXiv:2102.04939 [cs.LG] (Published 2021-02-09)

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

arXiv Analytics

arXiv:2406.16802 [cs.LG]Abstract References Reviews Resources

Improved Regret Bounds for Bandits with Expert Advice

Links

Toolbox

arXiv:2406.16802 [cs.LG]AbstractReferencesReviewsResources

Improved Regret Bounds for Bandits with Expert Advice

Links

Toolbox

arXiv:2406.16802 [cs.LG]Abstract References Reviews Resources