arXiv:1611.01353 Abstract | arXiv Analytics

arXiv:1611.01353 [stat.ML]Abstract References Reviews Resources

Information Dropout: learning optimal representations through noise

Published 2016-11-04Version 1

We introduce Information Dropout, a generalization of dropout that is motivated by the Information Bottleneck principle and highlights the way in which injecting noise in the activations can help in learning optimal representations of the data. Information Dropout is rooted in information theoretic principles, it includes as special cases several existing dropout methods, like Gaussian Dropout and Variational Dropout, and, unlike classical dropout, it can learn and build representations that are invariant to nuisances of the data, like occlusions and clutter. When the task is the reconstruction of the input, we show that the information dropout method yields a variational autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Our experiments validate the theoretical intuitions behind our method, and we find that information dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.

Categories: stat.ML, cs.LG, stat.CO

Keywords: learning optimal representations, information dropout method yields, special case, better generalization performance, information bottleneck principle

Related articles: Most relevant | Search more

arXiv:1902.08341 [stat.ML] (Published 2019-02-22)

FAVAE: Sequence Disentanglement using Information Bottleneck Principle

Masanori Yamada, Kim Heecheol, Kosuke Miyoshi, Hiroshi Yamakawa

arXiv:2305.17225 [stat.ML] (Published 2023-05-26)

Causal Component Analysis

Wendong Liang, Armin Kekić, Julius von Kügelgen, Simon Buchholz, Michel Besserve, Luigi Gresele, Bernhard Schölkopf

arXiv:1907.10477 [stat.ML] (Published 2019-07-24)

On the relationship between variational inference and adaptive importance sampling

Axel Finke, Alexandre H. Thiery