arXiv:1807.01251 Abstract | arXiv Analytics

arXiv:1807.01251 [cs.LG]Abstract References Reviews Resources

Training behavior of deep neural network in frequency domain

Zhi-Qin J. Xu, Yaoyu Zhang, Yanyang Xiao

Published 2018-07-03Version 1

Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery in deep learning. Existing works indicate that this observation holds for both complicated real datasets and simple datasets of one-dimensional (1-d) functions. In this work, for general low-frequency dominant 1-d functions, we find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). F-Principle can be observed over various DNN setups of different activation functions, layer structures and training algorithms in our experiments. F-Principle can be used to understand (i) the behavior of DNN training in the information plane and (ii) why DNNs often generalize well albeit its ability of overfitting. This F-Principle potentially can provide insights into understanding the general principle underlying DNN optimization and generalization for real datasets.

Comments: 9 pages, 5 figures

Categories: cs.LG, cs.AI, cs.IT, math.IT, stat.ML

Subjects: 62-07, I.2.6

Keywords: deep neural network, frequency domain, training behavior, f-principle, common settings first quickly captures

Related articles: Most relevant | Search more

arXiv:1905.07777 [cs.LG] (Published 2019-05-19)

A type of generalization error induced by initialization in deep neural networks

Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma

arXiv:1905.12213 [cs.LG] (Published 2019-05-29)

Where is the Information in a Deep Neural Network?

Alessandro Achille, Stefano Soatto

arXiv:1905.03381 [cs.LG] (Published 2019-05-08)