arXiv Analytics

Sign in

arXiv:1704.04932 [cs.LG]AbstractReferencesReviewsResources

Deep Relaxation: partial differential equations for optimizing deep neural networks

Pratik Chaudhari, Adam Oberman, Stanley Osher, Stefano Soatto, Guillame Carlier

Published 2017-04-17Version 1

We establish connections between non-convex optimization methods for training deep neural networks (DNNs) and the theory of partial differential equations (PDEs). In particular, we focus on relaxation techniques initially developed in statistical physics, which we show to be solutions of a nonlinear Hamilton-Jacobi-Bellman equation. We employ the underlying stochastic control problem to analyze the geometry of the relaxed energy landscape and its convergence properties, thereby confirming empirical evidence. This paper opens non-convex optimization problems arising in deep learning to ideas from the PDE literature. In particular, we show that the non-viscous Hamilton-Jacobi equation leads to an elegant algorithm based on the Hopf-Lax formula that outperforms state-of-the-art methods. Furthermore, we show that these algorithms scale well in practice and can effectively tackle the high dimensionality of modern neural networks.

Related articles: Most relevant | Search more
arXiv:1804.04272 [cs.LG] (Published 2018-04-12)
Deep Neural Networks motivated by Partial Differential Equations
arXiv:2301.10737 [cs.LG] (Published 2023-01-25)
Distributed Control of Partial Differential Equations Using Convolutional Reinforcement Learning
arXiv:2303.17078 [cs.LG] (Published 2023-03-30)
Machine Learning for Partial Differential Equations