arXiv:2412.16561 Abstract | arXiv Analytics

arXiv:2412.16561 [math.OC]Abstract References Reviews Resources

A learning-based approach to stochastic optimal control under reach-avoid constraint

Published 2024-12-21Version 1

We develop a model-free approach to optimally control stochastic, Markovian systems subject to a reach-avoid constraint. Specifically, the state trajectory must remain within a safe set while reaching a target set within a finite time horizon. Due to the time-dependent nature of these constraints, we show that, in general, the optimal policy for this constrained stochastic control problem is non-Markovian, which increases the computational complexity. To address this challenge, we apply the state-augmentation technique from arXiv:2402.19360, reformulating the problem as a constrained Markov decision process (CMDP) on an extended state space. This transformation allows us to search for a Markovian policy, avoiding the complexity of non-Markovian policies. To learn the optimal policy without a system model, and using only trajectory data, we develop a log-barrier policy gradient approach. We prove that under suitable assumptions, the policy parameters converge to the optimal parameters, while ensuring that the system trajectories satisfy the stochastic reach-avoid constraint with high probability.

Categories: math.OC, cs.LG

Keywords: stochastic optimal control, learning-based approach, optimal policy, log-barrier policy gradient approach, constrained stochastic control problem

Related articles: Most relevant | Search more

arXiv:1810.13043 [math.OC] (Published 2018-10-30)

Stochastic Optimal Control of Epidemic Processes in Networks

Lars Lorch, Abir De, Samir Bhatt, William Trouleau, Utkarsh Upadhyay, Manuel Gomez-Rodriguez

arXiv:2308.07507 [math.OC] (Published 2023-08-15)

Condition-Based Production for Stochastically Deteriorating Systems: Optimal Policies and Learning

Collin Drent, Melvin Drent, Joachim Arts

arXiv:2410.04615 [math.OC] (Published 2024-10-06)

Time-reversal solution of BSDEs in stochastic optimal control: a linear quadratic study

Yuhang Mei, Amirhossein Taghvaei