arXiv:1904.06260 Abstract | arXiv Analytics

arXiv:1904.06260 [cs.LG]Abstract References Reviews Resources

Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

Published 2019-04-12Version 1

Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL (and USL) where, the next state remains the same, regardless of the decisions taken, either in batch or online learning. Although this difference is fundamental between SL and RL, there are connections that have been overlooked. In particular, we prove in this paper that gradient policy method can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a new proof of policy gradient methods (PGM) that emphasizes the tight link with the cross entropy and supervised learning. We provide a simple experiment where we interchange label and pseudo rewards. We conclude that other relationships with SL could be made if we modify the reward functions wisely.

Comments: 6 pages, 1 figure

Categories: cs.LG, cs.AI, stat.ML

Keywords: policy gradient methods, supervised learning, reinforcement learning, similarities, gradient policy method

Related articles: Most relevant | Search more

arXiv:2010.05380 [cs.LG] (Published 2020-10-12)

Efficient Wasserstein Natural Gradients for Reinforcement Learning

Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton

arXiv:1810.02525 [cs.LG] (Published 2018-10-05)

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Peter Henderson, Joshua Romoff, Joelle Pineau

arXiv:1912.05104 [cs.LG] (Published 2019-12-11)

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods