arXiv:1905.10961 Abstract | arXiv Analytics

arXiv:1905.10961 [stat.ML]Abstract References Reviews Resources

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Guodong Zhang, James Martens, Roger Grosse

Published 2019-05-27Version 1

Natural gradient descent has proven effective at mitigating the effects of pathological curvature in neural network optimization, but little is known theoretically about its convergence properties, especially for \emph{nonlinear} networks. In this work, we analyze \emph{for the first time} the speed of convergence for natural gradient descent on nonlinear neural networks with the squared-error loss. We identify two conditions which guarantee the efficient convergence from random initializations: (1) the Jacobian matrix (of network's output for all training cases with respect to the parameters) is full row rank, and (2) the Jacobian matrix is stable for small perturbations around the initialization. For two-layer ReLU neural networks (i.e., with one hidden layer), we prove that these two conditions do in fact hold throughout the training, under the assumptions of nondegenerate inputs and overparameterization. We further extend our analysis to more general loss functions. Lastly, we show that K-FAC, an approximate natural gradient descent method, also converges to global minima under the same assumptions.

Categories: stat.ML, cs.LG

Keywords: overparameterized neural networks, fast convergence, approximate natural gradient descent method, jacobian matrix, two-layer relu neural networks

Related articles: Most relevant | Search more

arXiv:2106.06251 [stat.ML] (Published 2021-06-11)

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Shunta Akiyama, Taiji Suzuki

arXiv:2205.14818 [stat.ML] (Published 2022-05-30)

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

Shunta Akiyama, Taiji Suzuki

arXiv:2310.20579 [stat.ML] (Published 2023-10-31)

Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks

Jiayuan Ye, Zhenyu Zhu, Fanghui Liu, Reza Shokri, Volkan Cevher

arXiv Analytics

arXiv:1905.10961 [stat.ML]Abstract References Reviews Resources

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Links

Toolbox

arXiv:1905.10961 [stat.ML]AbstractReferencesReviewsResources

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Links

Toolbox

arXiv:1905.10961 [stat.ML]Abstract References Reviews Resources