arXiv:1807.03858 Abstract | arXiv Analytics

arXiv:1807.03858 [cs.LG]Abstract References Reviews Resources

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma

Published 2018-07-10Version 1

While model-based reinforcement learning has empirically been shown to significantly reduce the sample complexity that hinders model-free RL, the theoretical understanding of such methods has been rather limited. In this paper, we introduce a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees, and a practical algorithm Optimistic Lower Bounds Optimization (OLBO). In particular, we derive a theoretical guarantee of monotone improvement for model-based RL with our framework. We iteratively build a lower bound of the expected reward based on the estimated dynamical model and sample trajectories, and maximize it jointly over the policy and the model. Assuming the optimization in each iteration succeeds, the expected reward is guaranteed to improve. The framework also incorporates an optimism-driven perspective, and reveals the intrinsic measure for the model prediction error. Preliminary simulations demonstrate that our approach outperforms the standard baselines on continuous control benchmark tasks.

Categories: cs.LG, cs.AI, stat.ML

Keywords: model-based reinforcement learning, theoretical guarantee, algorithmic framework, algorithm optimistic lower bounds optimization, practical algorithm optimistic lower bounds

Related articles: Most relevant | Search more

arXiv:2106.14080 [cs.LG] (Published 2021-06-26)

Model-Advantage Optimization for Model-Based Reinforcement Learning

Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan

arXiv:2411.11511 [cs.LG] (Published 2024-11-18)

Structure learning with Temporal Gaussian Mixture for model-based Reinforcement Learning

Théophile Champion, Marek Grześ, Howard Bowman

arXiv:2009.00690 [cs.LG] (Published 2020-09-01)

Improved Bilevel Model: Fast and Optimal Algorithm with Theoretical Guarantee

Junyi Li, Bin Gu, Heng Huang