arXiv Analytics

Sign in

arXiv:2205.14335 [math.OC]AbstractReferencesReviewsResources

On the Sample Complexity of Stabilizing Linear Systems via Policy Gradient Methods

Feiran Zhao, Xingyun Fu, Keyou You

Published 2022-05-28Version 1

Stabilizing unknown dynamical systems with the direct use of data samples has drawn increasing attention in both control and machine learning communities. In this paper, we study the sample complexity of stabilizing linear time-invariant systems via Policy Gradient (PG) methods. Our analysis is built upon a discounted Linear Quadratic Regulator (LQR) framework which alternatively updates the policy and the discount factor of the LQR. In sharp contrast to the existing literature, we propose an explicit rule to adaptively adjust the discount factor by characterizing the stability margin using Lyapunov theory, which has independent interests of its own. We show that the number of iterations per discount factor is uniformly upper bounded, which enables us to prove the sample complexity of stabilizing linear systems via PG methods. Particularly, it only adds a coefficient logarithmic in the spectral radius of the state matrix to the sample complexity of solving LQR problems. We perform numerical experiments to verify our theoretical findings and empirically evaluate the effectiveness of our results on nonlinear systems.

Related articles: Most relevant | Search more
arXiv:2406.03734 [math.OC] (Published 2024-06-06)
Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence
arXiv:2101.01041 [math.OC] (Published 2021-01-04)
Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity
arXiv:2110.11721 [math.OC] (Published 2021-10-22, updated 2022-04-03)
Projection-Free Algorithm for Stochastic Bi-level Optimization