arXiv:2210.13132 Abstract | arXiv Analytics

arXiv:2210.13132 [stat.ML]Abstract References Reviews Resources

PAC-Bayesian Offline Contextual Bandits With Guarantees

Otmane Sakhi, Nicolas Chopin, Pierre Alquier

Published 2022-10-24Version 1

This paper introduces a new principled approach for offline policy optimisation in contextual bandits. For two well-established risk estimators, we propose novel generalisation bounds able to confidently improve upon the logging policy offline. Unlike previous work, our approach does not require tuning hyperparameters on held-out sets, and enables deployment with no prior A/B testing. This is achieved by analysing the problem through the PAC-Bayesian lens; mainly, we let go of traditional policy parametrisation (e.g. softmax) and instead interpret the policies as mixtures of deterministic strategies. We demonstrate through extensive experiments evidence of our bounds tightness and the effectiveness of our approach in practical scenarios.

Comments: Paper under review

Categories: stat.ML, cs.LG

Keywords: pac-bayesian offline contextual bandits, guarantees, offline policy optimisation, novel generalisation bounds, traditional policy parametrisation

Related articles: Most relevant | Search more

arXiv:2102.02504 [stat.ML] (Published 2021-02-04)

Meta-strategy for Learning Tuning Parameters with Guarantees

Dimitri Meunier, Pierre Alquier

arXiv:2402.08508 [stat.ML] (Published 2024-02-13, updated 2025-02-11)

A PAC-Bayesian Link Between Generalisation and Flat Minima

Maxime Haddouche, Paul Viallard, Umut Simsekli, Benjamin Guedj

arXiv:2405.05025 [stat.ML] (Published 2024-05-08)

Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges