arXiv:2102.04939 Abstract | arXiv Analytics

arXiv:2102.04939 [cs.LG]Abstract References Reviews Resources

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Published 2021-02-09Version 1

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least $\Omega((SA)^M)$ episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, {\it i.e.,} providing a sublinear regret guarantee when we are given a good initialization. Finally, if we are given standard statistical sufficiency assumptions common in the Predictive State Representation (PSR) literature (e.g., Boots et al.) and a reachability assumption, we show that the need for initialization can be removed.

Categories: cs.LG

Keywords: lower bound, latent mdps, standard statistical sufficiency assumptions common, latent markov decision processes, regret minimization problem

Related articles: Most relevant | Search more

arXiv:2106.11692 [cs.LG] (Published 2021-06-22)

A Unified Framework for Conservative Exploration

Yunchang Yang et al.

arXiv:1806.02970 [cs.LG] (Published 2018-06-08)

PAC Ranking from Pairwise and Listwise Queries: Lower Bounds and Upper Bounds

Wenbo Ren, Jia Liu, Ness B. Shroff

arXiv:2210.11604 [cs.LG] (Published 2022-10-20)

Horizon-Free Reinforcement Learning for Latent Markov Decision Processes

Runlong Zhou, Ruosong Wang, Simon S. Du

arXiv Analytics

arXiv:2102.04939 [cs.LG]Abstract References Reviews Resources

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Links

Toolbox

arXiv:2102.04939 [cs.LG]AbstractReferencesReviewsResources

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Links

Toolbox

arXiv:2102.04939 [cs.LG]Abstract References Reviews Resources