arXiv Analytics

Sign in

arXiv:1803.09357 [cs.LG]AbstractReferencesReviewsResources

On the Local Minima of the Empirical Risk

Chi Jin, Lydia T. Liu, Rong Ge, Michael I. Jordan

Published 2018-03-25, updated 2018-10-17Version 2

Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. Even for applications with nonconvex nonsmooth losses (such as modern deep networks), the population risk is generally significantly more well-behaved from an optimization point of view than the empirical risk. In particular, sampling can create many spurious local minima. We consider a general framework which aims to optimize a smooth nonconvex function $F$ (population risk) given only access to an approximation $f$ (empirical risk) that is pointwise close to $F$ (i.e., $\|F-f\|_{\infty} \le \nu$). Our objective is to find the $\epsilon$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $\nu$---which exist only in $f$. We propose a simple algorithm based on stochastic gradient descent (SGD) on a smoothed version of $f$ that is guaranteed to achieve our goal as long as $\nu \le O(\epsilon^{1.5}/d)$. We also provide an almost matching lower bound showing that our algorithm achieves optimal error tolerance $\nu$ among all algorithms making a polynomial number of queries of $f$. As a concrete example, we show that our results can be directly used to give sample complexities for learning a ReLU unit.

Related articles: Most relevant | Search more
arXiv:1903.02154 [cs.LG] (Published 2019-03-06)
A Priori Estimates of the Population Risk for Residual Networks
arXiv:1907.05476 [cs.LG] (Published 2019-07-11)
Minimizers of the Empirical Risk and Risk Monotonicity
arXiv:1703.09833 [cs.LG] (Published 2017-03-28)
Theory II: Landscape of the Empirical Risk in Deep Learning