arXiv:2505.01828 Abstract | arXiv Analytics

arXiv:2505.01828 [math.OC]Abstract References Reviews Resources

Rank-One Modified Value Iteration

Arman Sharifi Kolarijani, Tolga Ok, Peyman Mohajerin Esfahani, Mohamad Amin Sharif Kolarijani

Published 2025-05-03Version 1

In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.

Comments: 24 pages,9 figures, conference

Categories: math.OC, cs.LG, stat.ML

Keywords: rank-one modified value iteration, transition probability matrix, learning problem, algorithm consistently outperforms first-order algorithms, rank-one approximation

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:2501.16521 [math.OC] (Published 2025-01-27)

On characterizing optimal learning trajectories in a class of learning problems

Getachew K Befekadu

arXiv:2008.02491 [math.OC] (Published 2020-08-06)

Large-time asymptotics in deep learning

Carlos Esteve, Borjan Geshkovski, Dario Pighin, Enrique Zuazua

arXiv:2101.02776 [math.OC] (Published 2021-01-07)