arXiv:2105.12540 Abstract | arXiv Analytics

arXiv:2105.12540 [cs.LG]Abstract References Reviews Resources

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri

Published 2021-05-26Version 1

In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of $\mathcal{O}(\epsilon^{-3})$, outperforming all the previously known convergence bounds of such algorithms. In order to overcome the divergence due to deadly triad in off-policy policy evaluation under function approximation, we develop a critic that employs $n$-step TD-learning algorithm with a properly chosen $n$. We present finite-sample convergence bounds on this critic under both constant and diminishing step sizes, which are of independent interest. Furthermore, we develop a variant of natural policy gradient under function approximation, with an improved convergence rate of $\mathcal{O}(1/T)$ after $T$ iterations. Combining the finite sample error bounds of actor and the critic, we obtain the $\mathcal{O}(\epsilon^{-3})$ sample complexity. We derive our sample complexity bounds solely based on the assumption that the behavior policy sufficiently explores all the states and actions, which is a much lighter assumption compared to the related literature.

Categories: cs.LG, math.OC

Keywords: linear function approximation, finite-sample analysis, off-policy natural actor-critic algorithm, convergence bounds, finite sample error bounds

Related articles: Most relevant | Search more

arXiv:2102.09318 [cs.LG] (Published 2021-02-18)

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri

arXiv:1911.00934 [cs.LG] (Published 2019-11-03)

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, Zaiyue Yang

arXiv:1902.02234 [cs.LG] (Published 2019-02-06)

Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation

Shaofeng Zou, Tengyu Xu, Yingbin Liang