arXiv:2005.10175 Abstract | arXiv Analytics

arXiv:2005.10175 [cs.LG]Abstract References Reviews Resources

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Published 2020-05-20Version 1

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.

Comments: UAI 2020

Categories: cs.LG, stat.ML

Keywords: linear function approximation, finite-sample analysis, markovian noise, value-based reinforcement learning algorithms, optimal control

Related articles: Most relevant | Search more

arXiv:1902.02234 [cs.LG] (Published 2019-02-06)

Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation

Shaofeng Zou, Tengyu Xu, Yingbin Liang

arXiv:1911.00934 [cs.LG] (Published 2019-11-03)

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, Zaiyue Yang

arXiv:2105.12540 [cs.LG] (Published 2021-05-26)

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri

arXiv Analytics

arXiv:2005.10175 [cs.LG]Abstract References Reviews Resources

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Links

Toolbox

arXiv:2005.10175 [cs.LG]AbstractReferencesReviewsResources

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Links

Toolbox

arXiv:2005.10175 [cs.LG]Abstract References Reviews Resources