arXiv Analytics

Sign in

arXiv:2211.16110 [cs.LG]AbstractReferencesReviewsResources

PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison

Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters

Published 2022-11-29Version 1

PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes performance bounds for bandit problems and an experimental comparison of these bounds. Our experimental comparison has revealed that available PAC-Bayes upper bounds on the cumulative regret are loose, whereas available PAC-Bayes lower bounds on the expected reward can be surprisingly tight. We found that an offline contextual bandit algorithm that learns a policy by optimising a PAC-Bayes bound was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees.

Related articles: Most relevant | Search more
arXiv:1301.7401 [cs.LG] (Published 2013-01-30, updated 2015-05-16)
An Experimental Comparison of Several Clustering and Initialization Methods
arXiv:2102.03748 [cs.LG] (Published 2021-02-07)
PAC-Bayes Bounds for Meta-learning with Data-Dependent Prior
arXiv:1202.3750 [cs.LG] (Published 2012-02-14)
Fractional Moments on Bandit Problems