arXiv:1810.02525 Abstract | arXiv Analytics

arXiv:1810.02525 [cs.LG]Abstract References Reviews Resources

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Peter Henderson, Joshua Romoff, Joelle Pineau

Published 2018-10-05Version 1

Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods are often used for training neural networks via the temporal difference error or policy gradient. As an agent improves over time, the optimization target changes and thus the loss landscape (and local optima) change. Due to the failure modes of those methods, the ideal choice of optimizer for Deep RL remains unclear. As such, we provide an empirical analysis of the effects that a wide range of gradient descent optimizers and their hyperparameters have on policy gradient methods, a subset of Deep RL algorithms, for benchmark continuous control tasks. We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment. Our analysis suggests that there is significant interplay between the dynamics of the environment and Deep RL algorithm properties which aren't necessarily accounted for by traditional adaptive gradient methods. We provide suggestions for optimal settings of current methods and further lines of research based on our findings.

Comments: Accepted at the European Workshop on Reinforcement Learning 2018 (EWRL14)

Categories: cs.LG, cs.AI, stat.ML

Keywords: policy gradient methods, empirical analysis, deep rl remains unclear, deep rl algorithm properties, gradient descent optimization methods

Related articles: Most relevant | Search more

arXiv:1904.06260 [cs.LG] (Published 2019-04-12)

Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

Eric Benhamou

arXiv:1912.05104 [cs.LG] (Published 2019-12-11)

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

arXiv:2409.19437 [cs.LG] (Published 2024-09-28, updated 2024-10-23)

Strongly-polynomial time and validation analysis of policy gradient methods

Caleb Ju, Guanghui Lan

arXiv Analytics

arXiv:1810.02525 [cs.LG]Abstract References Reviews Resources

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Links

Toolbox

arXiv:1810.02525 [cs.LG]AbstractReferencesReviewsResources

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Links

Toolbox

arXiv:1810.02525 [cs.LG]Abstract References Reviews Resources