arXiv:1406.6812 Abstract | arXiv Analytics

arXiv:1406.6812 [cs.LG]Abstract References Reviews Resources

Online learning in MDPs with side information

Published 2014-06-26Version 1

We study online learning of finite Markov decision process (MDP) problems when a side information vector is available. The problem is motivated by applications such as clinical trials, recommendation systems, etc. Such applications have an episodic structure, where each episode corresponds to a patient/customer. Our objective is to compete with the optimal dynamic policy that can take side information into account. We propose a computationally efficient algorithm and show that its regret is at most $O(\sqrt{T})$, where $T$ is the number of rounds. To best of our knowledge, this is the first regret bound for this setting.

Categories: cs.LG, stat.ML

Keywords: online learning, finite markov decision process, first regret bound, side information vector, optimal dynamic policy

Related articles: Most relevant | Search more

arXiv:1711.03343 [cs.LG] (Published 2017-11-09)

Analysis of Dropout in Online Learning

Kazuyuki Hara

arXiv:2009.11942 [cs.LG] (Published 2020-09-24)

Online Learning With Adaptive Rebalancing in Nonstationary Environments

Kleanthis Malialis, Christos G. Panayiotou, Marios M. Polycarpou

arXiv:2007.05665 [cs.LG] (Published 2020-07-11)

A Computational Separation between Private Learning and Online Learning

Mark Bun

arXiv Analytics

arXiv:1406.6812 [cs.LG]Abstract References Reviews Resources

Online learning in MDPs with side information

Links

Toolbox

arXiv:1406.6812 [cs.LG]AbstractReferencesReviewsResources

Online learning in MDPs with side information

Links

Toolbox

arXiv:1406.6812 [cs.LG]Abstract References Reviews Resources