arXiv Analytics

Sign in

arXiv:2008.10870 [cs.LG]AbstractReferencesReviewsResources

Theory of Deep Q-Learning: A Dynamical Systems Perspective

Arunselvan Ramaswamy

Published 2020-08-25Version 1

Deep Q-Learning is an important algorithm, used to solve sequential decision making problems. It involves training a Deep Neural Network, called a Deep Q-Network (DQN), to approximate a function associated with optimal decision making, the Q-function. Although wildly successful in laboratory conditions, serious gaps between theory and practice prevent its use in the real-world. In this paper, we present a comprehensive analysis of the popular and practical version of the algorithm, under realistic verifiable assumptions. An important contribution is the characterization of its performance as a function of training. To do this, we view the algorithm as an evolving dynamical system. This facilitates associating a closely-related measure process with training. Then, the long-term behavior of Deep Q-Learning is determined by the limit of the aforementioned measure process. Empirical inferences, such as the qualitative advantage of using experience replay, and performance inconsistencies even after training, are explained using our analysis. Also, our theory is general and accommodates state Markov processes with multiple stationary distributions.

Related articles: Most relevant | Search more
arXiv:1807.01251 [cs.LG] (Published 2018-07-03)
Training behavior of deep neural network in frequency domain
arXiv:1905.07777 [cs.LG] (Published 2019-05-19)
A type of generalization error induced by initialization in deep neural networks
arXiv:1905.03381 [cs.LG] (Published 2019-05-08)
AutoAssist: A Framework to Accelerate Training of Deep Neural Networks