arXiv Analytics

Sign in

arXiv:1810.07716 [stat.ML]AbstractReferencesReviewsResources

The loss surface of deep linear networks viewed through the algebraic geometry lens

Dhagash Mehta, Tianran Chen, Tingting Tang, Jonathan D. Hauenstein

Published 2018-10-17Version 1

By using the viewpoint of modern computational algebraic geometry, we explore properties of the optimization landscapes of the deep linear neural network models. After clarifying on the various definitions of "flat" minima, we show that the geometrically flat minima, which are merely artifacts of residual continuous symmetries of the deep linear networks, can be straightforwardly removed by a generalized $L_2$ regularization. Then, we establish upper bounds on the number of isolated stationary points of these networks with the help of algebraic geometry. Using these upper bounds and utilizing a numerical algebraic geometry method, we find all stationary points of modest depth and matrix size. We show that in the presence of the non-zero regularization, deep linear networks indeed possess local minima which are not the global minima. Our computational results clarify certain aspects of the loss surfaces of deep linear networks and provide novel insights.

Related articles: Most relevant | Search more
arXiv:1809.10374 [stat.ML] (Published 2018-09-27)
An analytic theory of generalization dynamics and transfer learning in deep linear networks
arXiv:2212.14457 [stat.ML] (Published 2022-12-29)
Bayesian Interpolation with Deep Linear Networks
arXiv:2405.13456 [stat.ML] (Published 2024-05-22)
Deep linear networks for regression are implicitly regularized towards flat minima