arXiv Analytics

Sign in

arXiv:1807.01251 [cs.LG]AbstractReferencesReviewsResources

Training behavior of deep neural network in frequency domain

Zhi-Qin J. Xu, Yaoyu Zhang, Yanyang Xiao

Published 2018-07-03Version 1

Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery in deep learning. Existing works indicate that this observation holds for both complicated real datasets and simple datasets of one-dimensional (1-d) functions. In this work, for general low-frequency dominant 1-d functions, we find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). F-Principle can be observed over various DNN setups of different activation functions, layer structures and training algorithms in our experiments. F-Principle can be used to understand (i) the behavior of DNN training in the information plane and (ii) why DNNs often generalize well albeit its ability of overfitting. This F-Principle potentially can provide insights into understanding the general principle underlying DNN optimization and generalization for real datasets.

Related articles: Most relevant | Search more
arXiv:1905.07777 [cs.LG] (Published 2019-05-19)
A type of generalization error induced by initialization in deep neural networks
arXiv:1905.12213 [cs.LG] (Published 2019-05-29)
Where is the Information in a Deep Neural Network?
arXiv:1905.03381 [cs.LG] (Published 2019-05-08)
AutoAssist: A Framework to Accelerate Training of Deep Neural Networks