arXiv:2107.10199 Abstract | arXiv Analytics

arXiv:2107.10199 [cs.LG]Abstract References Reviews Resources

Distribution of Classification Margins: Are All Data Equal?

Andrzej Banburski, Fernanda De La Torre, Nishka Pant, Ishana Shastri, Tomaso Poggio

Published 2021-07-21Version 1

Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization. We then show that, after data separation is achieved, it is possible to dynamically reduce the training set by more than 99% without significant loss of performance. Interestingly, the resulting subset of "high capacity" features is not consistent across different training runs, which is consistent with the theoretical claim that all training points should converge to the same asymptotic margin under SGD and in the presence of both batch normalization and weight decay.

Comments: Previously online as CBMM Memo 115 on the CBMM MIT site

Categories: cs.LG, cs.AI, stat.ML

Keywords: data equal, distribution, exponential loss functions locally maximizes, functions locally maximizes classification margin, loss functions locally maximizes classification

Related articles: Most relevant | Search more

arXiv:1906.03574 [cs.LG] (Published 2019-06-09)

Transfer Learning by Modeling a Distribution over Policies

Disha Shrivastava, Eeshan Gunesh Dhekane, Riashat Islam

arXiv:2006.10096 [cs.LG] (Published 2020-06-17)

Towards Recurrent Autoregressive Flow Models

John Mern, Peter Morales, Mykel J. Kochenderfer

arXiv:2010.15100 [cs.LG] (Published 2020-10-28)

Evaluating Model Robustness to Dataset Shift