arXiv:1805.08266 Abstract | arXiv Analytics

arXiv:1805.08266 [stat.ML]Abstract References Reviews Resources

On the Selection of Initialization and Activation Function for Deep Neural Networks

Soufiane Hayou, Arnaud Doucet, Judith Rousseau

Published 2018-05-21Version 1

The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the learning procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Schoenholz et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `edge of chaos' can lead to good performance. We complete these recent results by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper when the network is initialized at the edge of chaos. By extending our analysis to a larger class of functions, we then identify an activation function, $\phi_{new}(x) = x \cdot \text{sigmoid}(x)$, which improves the information propagation over ReLU-like functions and does not suffer from the vanishing gradient problem. We demonstrate empirically that this activation function combined to a random initialization on the edge of chaos outperforms standard approaches. This complements recent independent work by Ramachandran et al. (2017) who have observed empirically in extensive simulations that this activation function performs better than many alternatives.

Comments: 8 pages, 15 figures

Categories: stat.ML, cs.LG

Keywords: deep neural networks, initialization, chaos outperforms standard approaches, activation function performs better, information

Related articles: Most relevant | Search more

arXiv:1802.07714 [stat.ML] (Published 2018-02-21)

Detecting Learning vs Memorization in Deep Neural Networks using Shared Structure Validation Sets

Elias Chaibub Neto

arXiv:1402.1869 [stat.ML] (Published 2014-02-08, updated 2014-06-07)

On the Number of Linear Regions of Deep Neural Networks

Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio

arXiv:1901.02182 [stat.ML] (Published 2019-01-08)

Comments on "Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?"