arXiv:2502.04788 Abstract | arXiv Analytics

arXiv:2502.04788 [math.OC]Abstract References Reviews Resources

A non-zero-sum game with reinforcement learning under mean-variance framework

Junyi Guo, Xia Han, Hao Wang, Kam Chuen Yuen

Published 2025-02-07Version 1

In this paper, we investigate a competitive market involving two agents who consider both their own wealth and the wealth gap with their opponent. Both agents can invest in a financial market consisting of a risk-free asset and a risky asset, under conditions where model parameters are partially or completely unknown. This setup gives rise to a non-zero-sum differential game within the framework of reinforcement learning (RL). Each agent aims to maximize his own Choquet-regularized, time-inconsistent mean-variance objective. Adopting the dynamic programming approach, we derive a time-consistent Nash equilibrium strategy in a general incomplete market setting. Under the additional assumption of a Gaussian mean return model, we obtain an explicit analytical solution, which facilitates the development of a practical RL algorithm. Notably, the proposed algorithm achieves uniform convergence, even though the conventional policy improvement theorem does not apply to the equilibrium policy. Numerical experiments demonstrate the robustness and effectiveness of the algorithm, underscoring its potential for practical implementation.

Categories: math.OC

Keywords: reinforcement learning, non-zero-sum game, mean-variance framework, gaussian mean return model, time-consistent nash equilibrium strategy

Related articles: Most relevant | Search more

arXiv:1802.07668 [math.OC] (Published 2018-02-21)

A model for system uncertainty in reinforcement learning

Ryan Murray, Michele Palladino

arXiv:1906.11392 [math.OC] (Published 2019-06-27)

From self-tuning regulators to reinforcement learning and back again

Nikolai Matni, Alexandre Proutiere, Anders Rantzer, Stephen Tu

arXiv:2003.02894 [math.OC] (Published 2020-03-05)

Distributional Robustness and Regularization in Reinforcement Learning

Esther Derman, Shie Mannor