arXiv Analytics

Sign in

arXiv:2210.13132 [stat.ML]AbstractReferencesReviewsResources

PAC-Bayesian Offline Contextual Bandits With Guarantees

Otmane Sakhi, Nicolas Chopin, Pierre Alquier

Published 2022-10-24Version 1

This paper introduces a new principled approach for offline policy optimisation in contextual bandits. For two well-established risk estimators, we propose novel generalisation bounds able to confidently improve upon the logging policy offline. Unlike previous work, our approach does not require tuning hyperparameters on held-out sets, and enables deployment with no prior A/B testing. This is achieved by analysing the problem through the PAC-Bayesian lens; mainly, we let go of traditional policy parametrisation (e.g. softmax) and instead interpret the policies as mixtures of deterministic strategies. We demonstrate through extensive experiments evidence of our bounds tightness and the effectiveness of our approach in practical scenarios.

Related articles: Most relevant | Search more
arXiv:2102.02504 [stat.ML] (Published 2021-02-04)
Meta-strategy for Learning Tuning Parameters with Guarantees
arXiv:2402.08508 [stat.ML] (Published 2024-02-13, updated 2025-02-11)
A PAC-Bayesian Link Between Generalisation and Flat Minima
arXiv:2405.05025 [stat.ML] (Published 2024-05-08)
Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges