arXiv Analytics

Sign in

arXiv:1907.09765 [cs.LG]AbstractReferencesReviewsResources

Variance Reduction in Actor Critic Methods (ACM)

Eric Benhamou

Published 2019-07-23Version 1

After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the $L^2$ norm for the control variate estimators spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.

Related articles: Most relevant | Search more
arXiv:1803.07246 [cs.LG] (Published 2018-03-20)
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
Cathy Wu et al.
arXiv:2411.10438 [cs.LG] (Published 2024-11-15)
MARS: Unleashing the Power of Variance Reduction for Training Large Models
arXiv:2006.03041 [cs.LG] (Published 2020-06-04)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction