arXiv:2205.10217 Abstract | arXiv Analytics

arXiv:2205.10217 [stat.ML]Abstract References Reviews Resources

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Simone Bombari, Mohammad Hossein Amani, Marco Mondelli

Published 2022-05-20Version 1

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $\Omega(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly $\Omega(N)$ and, hence, the number of neurons is as little as $\Omega(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.

Categories: stat.ML, cs.IT, cs.LG, math.IT

Keywords: deep neural networks, minimum over-parameterization, deep networks, optimization, sub-linear layer widths

Related articles: Most relevant | Search more

arXiv:1412.5896 [stat.ML] (Published 2014-12-18)

On the Stability of Deep Networks

Raja Giryes, Guillermo Sapiro, Alex M. Bronstein

arXiv:1712.09482 [stat.ML] (Published 2017-12-27)

Robust Loss Functions under Label Noise for Deep Neural Networks

Aritra Ghosh, Himanshu Kumar, P. S. Sastry

arXiv:1806.01316 [stat.ML] (Published 2018-06-04)

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

Ryo Karakida, Shotaro Akaho, Shun-ichi Amari