arXiv:2103.09280 Abstract | arXiv Analytics

arXiv:2103.09280 [math.OC]Abstract References Reviews Resources

Value-Gradient based Formulation of Optimal Control Problem and Machine Learning Algorithm

Alain Bensoussan, Jiayue Han, Sheung Chi Phillip Yam, Xiang Zhou

Published 2021-03-16Version 1

Optimal control problem is typically formulated by Hamilton-Jacobi-Bellman (HJB) equation for the value function and it is well-known that the value function is the viscosity solution of the HJB equation. Once the HJB solution is known, it can be used to construct the optimal control by taking the minimizer of the Hamiltonian. In this work, instead of focusing on the value function, we propose a new formulation for the components of the gradient of the value function (value-gradient) as a decoupled system of partial differential equations in the context of continuous-time deterministic discounted optimal control problem. We develop an efficient iterative scheme for this system of equations in parallel by utilizing the properties that they share the same characteristic curves as the HJB equation for the value function. This property allows us to generalize prior successive approximation algorithms of policy iteration from the value function to the value-gradient functions. To be compatible with the high dimensional control problem, we generate multiple characteristic curves at each policy iteration from an ensemble of initial states, and compute both the value function and its gradient simultaneously on each curve as the labelled data. Then supervised machine learning strategy is applied to minimize the weighted squared loss for both the value function and its gradients. Experimental results of various examples demonstrate that this new strategy of jointly learning both the value function and its gradient not only significantly increases the accuracy but also improves the efficiency and robustness, particularly with less amount of characteristics data or fewer training steps.

Categories: math.OC

Keywords: value function, machine learning algorithm, discounted optimal control problem, deterministic discounted optimal control, prior successive approximation algorithms

Related articles: Most relevant | Search more

arXiv:1211.3724 [math.OC] (Published 2012-11-15, updated 2013-05-23)

Variational properties of value functions

Aleksandr Y. Aravkin, James V. Burke, Michael P. Friedlander

arXiv:1409.5986 [math.OC] (Published 2014-09-21)

Domain Decomposition for Stochastic Optimal Control

Matanya B. Horowitz, Ivan Papusha, Joel W. Burdick

arXiv:1703.10746 [math.OC] (Published 2017-03-31)

Sufficient conditions for the value function and optimal strategy to be even and quasi-convex

Jhelum Chakravorty, Aditya Mahajan