arXiv Analytics

Sign in

arXiv:2502.04788 [math.OC]AbstractReferencesReviewsResources

A non-zero-sum game with reinforcement learning under mean-variance framework

Junyi Guo, Xia Han, Hao Wang, Kam Chuen Yuen

Published 2025-02-07Version 1

In this paper, we investigate a competitive market involving two agents who consider both their own wealth and the wealth gap with their opponent. Both agents can invest in a financial market consisting of a risk-free asset and a risky asset, under conditions where model parameters are partially or completely unknown. This setup gives rise to a non-zero-sum differential game within the framework of reinforcement learning (RL). Each agent aims to maximize his own Choquet-regularized, time-inconsistent mean-variance objective. Adopting the dynamic programming approach, we derive a time-consistent Nash equilibrium strategy in a general incomplete market setting. Under the additional assumption of a Gaussian mean return model, we obtain an explicit analytical solution, which facilitates the development of a practical RL algorithm. Notably, the proposed algorithm achieves uniform convergence, even though the conventional policy improvement theorem does not apply to the equilibrium policy. Numerical experiments demonstrate the robustness and effectiveness of the algorithm, underscoring its potential for practical implementation.

Related articles: Most relevant | Search more
arXiv:1802.07668 [math.OC] (Published 2018-02-21)
A model for system uncertainty in reinforcement learning
arXiv:1906.11392 [math.OC] (Published 2019-06-27)
From self-tuning regulators to reinforcement learning and back again
arXiv:2003.02894 [math.OC] (Published 2020-03-05)
Distributional Robustness and Regularization in Reinforcement Learning