arXiv Analytics

Sign in

arXiv:1609.09106 [cs.LG]AbstractReferencesReviewsResources

HyperNetworks

David Ha, Andrew Dai, Quoc V. Le

Published 2016-09-27Version 1

This work explores hypernetworks: an approach of using a small network, also known as a hypernetwork, to generate the weights for a larger network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernetworks can be viewed as relaxed form of weight-sharing across layers. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve state-of-art results on a variety of language modeling tasks with Character-Level Penn Treebank and Hutter Prize Wikipedia datasets, challenging the weight-sharing paradigm for recurrent networks. Our results also show that hypernetworks applied to convolutional networks still achieve respectable results for image recognition tasks compared to state-of-the-art baseline models while requiring fewer learnable parameters.

Related articles: Most relevant | Search more
arXiv:2006.13915 [cs.LG] (Published 2020-06-24)
Hierarchically Local Tasks and Deep Convolutional Networks
arXiv:2305.08404 [cs.LG] (Published 2023-05-15)
Theoretical Analysis of Inductive Biases in Deep Convolutional Networks
arXiv:1511.06072 [cs.LG] (Published 2015-11-19)
Mediated Experts for Deep Convolutional Networks