arXiv:1903.05812 [math.OC]AbstractReferencesReviewsResources
Reinforcement Learning for Decentralized Stochastic Control and Coordination Games
Bora Yongacoglu, Gürdal Arslan, Serdar Yüksel
Published 2019-03-14Version 1
In the study of stochastic dynamic team problems, analytical methods for finding optimal policies are often inapplicable due to lack of prior knowledge of the cost function or the state dynamics. Reinforcement learning offers a possible solution to such coordination problems. Existing learning methods for coordinating play either rely on control sharing among controllers or otherwise, in general, do not guarantee convergence to optimal policies. In a recent paper, we provided a decentralized algorithm for finding equilibrium policies in weakly acyclic stochastic dynamic games, which contain team games as an important special case. However, stochastic dynamic teams can in general possess suboptimal equilibrium policies whose cost can be arbitrarily higher than a team optimal policy's cost. In this paper, we present a reinforcement learning algorithm and its refinements, and provide probabilistic guarantees for convergence to globally optimal policies in team games as well as a more general class of coordination games. The algorithms presented here are strictly decentralized in that they require only access to local information such as cost realizations, previous local actions, and state transitions.