arXiv:2107.08966 Abstract | arXiv Analytics

arXiv:2107.08966 [cs.LG]Abstract References Reviews Resources

Decoupling Exploration and Exploitation in Reinforcement Learning

Lukas Schäfer, Filippos Christianos, Josiah Hanna, Stefano V. Albrecht

Published 2021-07-19Version 1

Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.

Comments: Unsupervised Reinforcement Learning (URL) Workshop in the 38th International Conference on Machine Learning (ICML), 2021

Categories: cs.LG, cs.AI

Keywords: reinforcement learning, decoupling exploration, exploitation, intrinsic rewards, evaluate derl algorithms

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:1802.05054 [cs.LG] (Published 2018-02-14)

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

Cédric Colas, Olivier Sigaud, Pierre-Yves Oudeyer

arXiv:1203.3481 [cs.LG] (Published 2012-03-15)

Real-Time Scheduling via Reinforcement Learning

Robert Glaubius, Terry Tidwell, Christopher Gill, William D. Smart

arXiv:1803.00590 [cs.LG] (Published 2018-03-01)