arXiv:2006.16712 Abstract | arXiv Analytics

arXiv:2006.16712 [cs.LG]Abstract References Reviews Resources

Model-based Reinforcement Learning: A Survey

Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker

Published 2020-06-30Version 1

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a key challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two key sections, we also discuss the potential benefits of model-based RL, like enhanced data efficiency, targeted exploration, and improved stability. Along the survey, we also draw connections to several related RL fields, like hierarchical RL and transfer, and other research disciplines, like behavioural psychology. Altogether, the survey presents a broad conceptual overview of planning-learning combinations for MDP optimization.

Categories: cs.LG, cs.AI, stat.ML

Keywords: model-based reinforcement learning, broad conceptual overview, model-based rl, real data collection, markov decision process

Related articles: Most relevant | Search more

arXiv:1906.03804 [cs.LG] (Published 2019-06-10)

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes

Alekh Agarwal, Sham Kakade, Lin F. Yang

arXiv:2210.00898 [cs.LG] (Published 2022-09-30)

Robust $Q$-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty

Ariel Neufeld, Julian Sester

arXiv:2106.08229 [cs.LG] (Published 2021-06-03)

MICo: Learning improved representations via sampling-based state similarity for Markov decision processes

Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland