arXiv:1904.01367 Abstract | arXiv Analytics

arXiv:1904.01367 [stat.ML]Abstract References Reviews Resources

Why ResNet Works? Residuals Generalize

Fengxiang He, Tongliang Liu, Dacheng Tao

Published 2019-04-02Version 1

Residual connections significantly boost the performance of deep neural networks. However, there are few theoretical results that address the influence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This paper studies the influence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We prove that the upper bound of the covering number is the same as chain-like neural networks, if the total numbers of the weight matrices and nonlinearities are fixed, no matter whether they are in the residuals or not. This result demonstrates that residual connections may not increase the hypothesis complexity of the neural network compared with the chain-like counterpart. Based on the upper bound of the covering number, we then obtain an $\mathcal O(1 / \sqrt{N})$ margin-based multi-class generalization bound for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straight-forward. From our generalization bound, a practical implementation is summarized: to approach a good generalization ability, we need to use regularization terms to control the magnitude of the norms of weight matrices not to increase too much, which justifies the standard technique of weight decay.

Categories: stat.ML, cs.LG

Keywords: deep neural network, residual connections, resnet works, hypothesis complexity, residuals generalize

Related articles: Most relevant | Search more

arXiv:1903.09215 [stat.ML] (Published 2019-03-21)

Empirical confidence estimates for classification by deep neural networks

Chris Finlay, Adam M. Oberman

arXiv:2202.07679 [stat.ML] (Published 2022-02-15)

Taking a Step Back with KCal: Multi-Class Kernel-Based Calibration for Deep Neural Networks

Zhen Lin, Shubhendu Trivedi, Jimeng Sun

arXiv:1607.00485 [stat.ML] (Published 2016-07-02)

Group Sparse Regularization for Deep Neural Networks

Simone Scardapane, Danilo Comminiello, Amir Hussain, Aurelio Uncini