arXiv:2201.02115 Abstract | arXiv Analytics

arXiv:2201.02115 [stat.ML]Abstract References Reviews Resources

The dynamics of representation learning in shallow, non-linear autoencoders

Published 2022-01-06, updated 2022-06-16Version 2

Autoencoders are the simplest neural network for unsupervised learning, and thus an ideal framework for studying feature learning. While a detailed understanding of the dynamics of linear autoencoders has recently been obtained, the study of non-linear autoencoders has been hindered by the technical difficulty of handling training data with non-trivial correlations - a fundamental prerequisite for feature extraction. Here, we study the dynamics of feature learning in non-linear, shallow autoencoders. We derive a set of asymptotically exact equations that describe the generalisation dynamics of autoencoders trained with stochastic gradient descent (SGD) in the limit of high-dimensional inputs. These equations reveal that autoencoders learn the leading principal components of their inputs sequentially. An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights, and highlights the importance of training the bias in ReLU autoencoders. Building on previous results for linear networks, we analyse a modification of the vanilla SGD algorithm which allows learning of the exact principal components. Finally, we show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets such as CIFAR10.

Categories: stat.ML, cond-mat.dis-nn, cond-mat.stat-mech, cs.LG

Keywords: non-linear autoencoders, representation learning, generalisation dynamics, stochastic gradient descent, simplest neural network

Related articles: Most relevant | Search more

arXiv:1908.07607 [stat.ML] (Published 2019-08-20)

Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent

Tomer Lancewicki, Selcuk Kopru

arXiv:1805.07960 [stat.ML] (Published 2018-05-21)

Stochastic Gradient Descent for Stochastic Doubly-Nonconvex Composite Optimization

Takayuki Kawashima, Hironori Fujisawa

arXiv:2207.04922 [stat.ML] (Published 2022-07-11)

On uniform-in-time diffusion approximation for stochastic gradient descent

Lei Li, Yuliang Wang