arXiv:2011.01424 Abstract | arXiv Analytics

arXiv:2011.01424 [cs.CV]Abstract References Reviews Resources

In Defense of Feature Mimicking for Knowledge Distillation

Published 2020-11-03Version 1

Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logit as extra supervision to train the student network. In this paper, we argue that it is more advantageous to make the student mimic the teacher's features in the penultimate layer. Not only the student can directly learn more effective information from the teacher feature, feature mimicking can also be applied for teachers trained without a softmax layer. Experiments show that it can achieve higher accuracy than traditional KD. To further facilitate feature mimicking, we decompose a feature vector into the magnitude and the direction. We argue that the teacher should give more freedom to the student feature's magnitude, and let the student pay more attention on mimicking the feature direction. To meet this requirement, we propose a loss term based on locality-sensitive hashing (LSH). With the help of this new loss, our method indeed mimics feature directions more accurately, relaxes constraints on feature magnitudes, and achieves state-of-the-art distillation accuracy.

Categories: cs.CV, cs.LG

Keywords: knowledge distillation, feature mimicking, achieves state-of-the-art distillation accuracy, train efficient networks, mimics feature directions

Related articles: Most relevant | Search more

arXiv:2108.00587 [cs.CV] (Published 2021-08-02)

Semi-Supervising Learning, Transfer Learning, and Knowledge Distillation with SimCLR

Khoi Nguyen, Yen Nguyen, Bao Le

arXiv:2304.01029 [cs.CV] (Published 2023-04-03)

Domain Generalization for Crop Segmentation with Knowledge Distillation

Simone Angarano, Mauro Martini, Alessandro Navone, Marcello Chiaberge

arXiv:1907.09643 [cs.CV] (Published 2019-07-23)

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

Haoran Zhao, Xin Sun, Junyu Dong, Changrui Chen, Zihe Dong