arXiv:2001.10509 Abstract | arXiv Analytics

arXiv:2001.10509 [cs.LG]Abstract References Reviews Resources

MSE-Optimal Neural Network Initialization via Layer Fusion

Ramina Ghods, Andrew S. Lan, Tom Goldstein, Christoph Studer

Published 2020-01-28Version 1

Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders parameter learning susceptible to initialization. To address this issue, a variety of methods that rely on random parameter initialization or knowledge distillation have been proposed in the past. In this paper, we propose FuseInit, a novel method to initialize shallower networks by fusing neighboring layers of deeper networks that are trained with random initialization. We develop theoretical results and efficient algorithms for mean-square error (MSE)-optimal fusion of neighboring dense-dense, convolutional-dense, and convolutional-convolutional layers. We show experiments for a range of classification and regression datasets, which suggest that deeper neural networks are less sensitive to initialization and shallower networks can perform better (sometimes as well as their deeper counterparts) if initialized with FuseInit.

Comments: Extended version of the CISS 2020 paper containing the proof for convolutional layers

Categories: cs.LG, eess.SP, stat.ML

Keywords: mse-optimal neural network initialization, layer fusion, problems renders parameter learning, neural networks achieve state-of-the-art performance, deep neural networks achieve state-of-the-art

Related articles:

arXiv:2007.14917 [cs.LG] (Published 2020-07-29)

Compressing Deep Neural Networks via Layer Fusion

James O' Neill, Greg Ver Steeg, Aram Galstyan

arXiv:2201.11218 [cs.LG] (Published 2022-01-26)

DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators

Sheng-Chun Kao, Xiaoyu Huang, Tushar Krishna

arXiv:2409.09258 [cs.LG] (Published 2024-09-14)

Active Learning to Guide Labeling Efforts for Question Difficulty Estimation