arXiv:2412.14031 Abstract | arXiv Analytics

arXiv:2412.14031 [math.OC]Abstract References Reviews Resources

Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective

Published 2024-12-18Version 1

We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. In the underparameterized regime, the Gauss-Newton gradient flow induces a Riemannian gradient flow on a low-dimensional, smooth, embedded submanifold of the Euclidean output space. Using tools from Riemannian optimization, we prove \emph{last-iterate} convergence of the Riemannian gradient flow to the optimal in-class predictor at an \emph{exponential rate} that is independent of the conditioning of the Gram matrix, \emph{without} requiring explicit regularization. We further characterize the critical impacts of the neural network scaling factor and the initialization on the convergence behavior. In the overparameterized regime, we show that the Levenberg-Marquardt dynamics with an appropriately chosen damping factor yields robustness to ill-conditioned kernels, analogous to the underparameterized regime. These findings demonstrate the potential of Gauss-Newton methods for efficiently optimizing neural networks, particularly in ill-conditioned problems where kernel and Gram matrices have small singular values.

Categories: math.OC, cs.AI, cs.LG, cs.SY, eess.SY, stat.ML

Keywords: neural network, riemannian optimization perspective, gauss-newton dynamics, riemannian gradient flow, gauss-newton gradient flow induces

Related articles: Most relevant | Search more

arXiv:1910.01895 [math.OC] (Published 2019-10-04)

Approximate policy iteration using neural networks for storage problems

Trivikram Dokka, Richlove Frimpong

arXiv:2303.04436 [math.OC] (Published 2023-03-08, updated 2023-09-07)

A comparison of rational and neural network based approximations

Vinesha Peiris, Reinier Diaz Millan, Nadezda Sukhorukova, Julien Ugon

arXiv:1802.08539 [math.OC] (Published 2018-02-23)

Computation of optimal transport and related hedging problems via penalization and neural networks

Stephan Eckstein, Michael Kupper