arXiv:1907.09765 Abstract | arXiv Analytics

arXiv:1907.09765 [cs.LG]Abstract References Reviews Resources

Variance Reduction in Actor Critic Methods (ACM)

Published 2019-07-23Version 1

After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the $L^2$ norm for the control variate estimators spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.

Categories: cs.LG, stat.ML

Keywords: variance reduction, control variate estimators, advantage actor critic methods, deep policy gradient methods, traditional a2c method

Related articles: Most relevant | Search more

arXiv:1803.07246 [cs.LG] (Published 2018-03-20)

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Cathy Wu et al.

arXiv:2411.10438 [cs.LG] (Published 2024-11-15)

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Huizhuo Yuan, Yifeng Liu, Shuang Wu, Xun Zhou, Quanquan Gu

arXiv:2006.03041 [cs.LG] (Published 2020-06-04)

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, Yuxin Chen

arXiv Analytics

arXiv:1907.09765 [cs.LG]Abstract References Reviews Resources

Variance Reduction in Actor Critic Methods (ACM)

Links

Toolbox

arXiv:1907.09765 [cs.LG]AbstractReferencesReviewsResources

Variance Reduction in Actor Critic Methods (ACM)

Links

Toolbox

arXiv:1907.09765 [cs.LG]Abstract References Reviews Resources