arXiv:2007.14917 Abstract | arXiv Analytics

arXiv:2007.14917 [cs.LG]Abstract References Reviews Resources

Compressing Deep Neural Networks via Layer Fusion

James O' Neill, Greg Ver Steeg, Aram Galstyan

Published 2020-07-29Version 1

This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2\% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset where pretrained transformer models are used, we achieve compression that leads to a network that is 20\% of its original size while being within 5 perplexity points of the original network. We also find that other well-established compression techniques can achieve competitive performance when compared to their original networks given a sufficient number of retraining steps. Generally, we observe a clear inflection point in performance as the amount of compression increases, suggesting a bound on the amount of compression that can be achieved before an exponential degradation in performance.

Categories: cs.LG, stat.ML

Keywords: compressing deep neural networks, layer fusion, original network, deep convolution neural networks, little additional computation overhead

Related articles: Most relevant | Search more

arXiv:1904.06194 [cs.LG] (Published 2019-04-11)

Compressing deep neural networks by matrix product operators

Ze-Feng Gao, Song Cheng, Rong-Qiang He, Z. Y. Xie, Hui-Hai Zhao, Zhong-Yi Lu, Tao Xiang

arXiv:2003.06308 [cs.LG] (Published 2020-03-11)

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

Giuseppe Di Guglielmo et al.

arXiv:2001.10509 [cs.LG] (Published 2020-01-28)

MSE-Optimal Neural Network Initialization via Layer Fusion

Ramina Ghods, Andrew S. Lan, Tom Goldstein, Christoph Studer

arXiv Analytics

arXiv:2007.14917 [cs.LG]Abstract References Reviews Resources

Compressing Deep Neural Networks via Layer Fusion

Links

Toolbox

arXiv:2007.14917 [cs.LG]AbstractReferencesReviewsResources

Compressing Deep Neural Networks via Layer Fusion

Links

Toolbox

arXiv:2007.14917 [cs.LG]Abstract References Reviews Resources