arXiv Analytics

Sign in

arXiv:2105.14368 [stat.ML]AbstractReferencesReviewsResources

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

Mikhail Belkin

Published 2021-05-29Version 1

In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation, and its sibling, over-parameterization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parameterization enables interpolation and provides flexibility to select a right interpolating model. As we will see, just as a physical prism separates colors mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern Machine Learning. This article is written with belief and hope that clearer understanding of these issues brings us a step closer toward a general theory of deep learning and machine learning.

Comments: A version of this paper will appear in Acta Numerica
Categories: stat.ML, cs.LG, math.ST, stat.TH
Related articles: Most relevant | Search more
arXiv:1908.11140 [stat.ML] (Published 2019-08-29)
Deep Learning and MARS: A Connection
arXiv:1704.01312 [stat.ML] (Published 2017-04-05)
On Generalization and Regularization in Deep Learning
arXiv:1804.10988 [stat.ML] (Published 2018-04-29)
SHADE: Information-Based Regularization for Deep Learning