arXiv Analytics

Sign in

arXiv:1902.05967 [cs.LG]AbstractReferencesReviewsResources

Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

Hesham Mostafa, Xin Wang

Published 2019-02-15Version 1

Deep neural networks are typically highly over-parameterized with pruning techniques able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic re-allocation of non-zero parameters have emerged for training sparse networks directly without having to train a large dense model beforehand. We present a parameter re-allocation scheme that addresses the limitations of previous methods such as their high computational cost and the fixed number of parameters they allocate to each layer. We investigate the performance of these dynamic re-allocation methods in deep convolutional networks and show that our method outperforms previous static and dynamic parameterization methods, yielding the best accuracy for a given number of training parameters, and performing on par with networks obtained by iteratively pruning a trained dense model. We further investigated the mechanisms underlying the superior performance of the resulting sparse networks. We found that neither the structure, nor the initialization of the sparse networks discovered by our parameter reallocation scheme are sufficient to explain their superior generalization performance. Rather, it is the continuous exploration of different sparse network structures during training that is critical to effective learning. We show that it is more fruitful to explore these structural degrees of freedom than to add extra parameters to the network.

Related articles: Most relevant | Search more
arXiv:1901.08624 [cs.LG] (Published 2019-01-24)
AutoShuffleNet: Learning Permutation Matrices via an Exact Lipschitz Continuous Penalty in Deep Convolutional Neural Networks
arXiv:1809.09399 [cs.LG] (Published 2018-09-25)
Non-Iterative Knowledge Fusion in Deep Convolutional Neural Networks
arXiv:1809.05606 [cs.LG] (Published 2018-09-14)
Non-iterative recomputation of dense layers for performance improvement of DCNN