arXiv Analytics

Sign in

arXiv:2210.07338 [cs.LG]AbstractReferencesReviewsResources

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

Anna Winnicki, R. Srikant

Published 2022-10-13Version 1

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.

Related articles: Most relevant | Search more
arXiv:1902.07656 [cs.LG] (Published 2019-02-20)
LOSSGRAD: automatic learning rate in gradient descent
arXiv:2204.08809 [cs.LG] (Published 2022-04-19)
Making Progress Based on False Discoveries
arXiv:2203.16462 [cs.LG] (Published 2022-03-30)
Convergence of gradient descent for deep neural networks