arXiv:2205.14335 Abstract | arXiv Analytics

arXiv:2205.14335 [math.OC]Abstract References Reviews Resources

On the Sample Complexity of Stabilizing Linear Systems via Policy Gradient Methods

Published 2022-05-28Version 1

Stabilizing unknown dynamical systems with the direct use of data samples has drawn increasing attention in both control and machine learning communities. In this paper, we study the sample complexity of stabilizing linear time-invariant systems via Policy Gradient (PG) methods. Our analysis is built upon a discounted Linear Quadratic Regulator (LQR) framework which alternatively updates the policy and the discount factor of the LQR. In sharp contrast to the existing literature, we propose an explicit rule to adaptively adjust the discount factor by characterizing the stability margin using Lyapunov theory, which has independent interests of its own. We show that the number of iterations per discount factor is uniformly upper bounded, which enables us to prove the sample complexity of stabilizing linear systems via PG methods. Particularly, it only adds a coefficient logarithmic in the spectral radius of the state matrix to the sample complexity of solving LQR problems. We perform numerical experiments to verify our theoretical findings and empirically evaluate the effectiveness of our results on nonlinear systems.

Categories: math.OC, cs.SY, eess.SY

Keywords: sample complexity, stabilizing linear systems, policy gradient methods, discount factor, stabilizing linear time-invariant systems

Related articles: Most relevant | Search more

arXiv:2406.03734 [math.OC] (Published 2024-06-06)

Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

Feiran Zhao, Keyou You

arXiv:2101.01041 [math.OC] (Published 2021-01-04)

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Kaiqing Zhang, Xiangyuan Zhang, Bin Hu, Tamer Başar

arXiv:2110.11721 [math.OC] (Published 2021-10-22, updated 2022-04-03)

Projection-Free Algorithm for Stochastic Bi-level Optimization

Zeeshan Akhtar, Amrit Singh Bedi, Srujan Teja Thomdapu, Ketan Rajawat