arXiv:2003.14089 Abstract | arXiv Analytics

arXiv:2003.14089 [cs.LG]Abstract References Reviews Resources

Leverage the Average: an Analysis of Regularization in RL

Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Published 2020-03-31Version 1

Building upon the formalism of regularized Markov decision processes, we study the effect of Kullback-Leibler (KL) and entropy regularization in reinforcement learning. Through an equivalent formulation of the related approximate dynamic programming (ADP) scheme, we show that a KL penalty amounts to averaging q-values. This equivalence allows drawing connections between a priori disconnected methods from the literature, and proving that a KL regularization indeed leads to averaging errors made at each iteration of value function update. With the proposed theoretical analysis, we also study the interplay between KL and entropy regularization. When the considered ADP scheme is combined with neural-network-based stochastic approximations, the equivalence is lost, which suggests a number of different ways to do regularization. Because this goes beyond what we can analyse theoretically, we extensively study this aspect empirically.

Categories: cs.LG, stat.ML

Keywords: entropy regularization, kl penalty amounts, regularized markov decision processes, value function update, equivalent formulation

Related articles: Most relevant | Search more

arXiv:1901.11275 [cs.LG] (Published 2019-01-31)

A Theory of Regularized Markov Decision Processes

Matthieu Geist, Bruno Scherrer, Olivier Pietquin

arXiv:1912.01557 [cs.LG] (Published 2019-12-02)

On-policy Reinforcement Learning with Entropy Regularization

Jingbin Liu, Xinyang Gu, Dexiang Zhang, Shuai Liu

arXiv:1811.11214 [cs.LG] (Published 2018-11-27)

Understanding the impact of entropy in policy learning

Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans