arXiv:1805.03359 Abstract | arXiv Analytics

arXiv:1805.03359 [cs.LG]Abstract References Reviews Resources

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Joshua Romoff, Alexandre Piché, Peter Henderson, Vincent Francois-Lavet, Joelle Pineau

Published 2018-05-09Version 1

In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and control-variates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward signal. This results in theoretical reductions in variance in the tabular case, as well as empirical improvements in both the function approximation and tabular settings in environments where rewards are stochastic. To do so, we use a modified version of Advantage Actor Critic (A2C) on variations of Atari games.

Comments: Accepted to the International Conference on Learning Representations (ICLR) 2018 Workshop Track

Categories: cs.LG, cs.AI, stat.ML

Keywords: deep reinforcement learning, reward estimation, advantage actor critic, variance reduction methods, help reduce variance

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:1806.08894 [cs.LG] (Published 2018-06-23)

Deep Reinforcement Learning: An Overview

Seyed Sajad Mousavi, Michael Schukat, Enda Howley

arXiv:1810.12558 [cs.LG] (Published 2018-10-30)

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

Mahammad Humayoo, Xueqi Cheng

arXiv:1901.02219 [cs.LG] (Published 2019-01-08)