arXiv:2108.06681 Abstract | arXiv Analytics

arXiv:2108.06681 [cs.CV]Abstract References Reviews Resources

Multi-granularity for knowledge distillation

Published 2021-08-15Version 1

Considering the fact that students have different abilities to understand the knowledge imparted by teachers, a multi-granularity distillation mechanism is proposed for transferring more understandable knowledge for student networks. A multi-granularity self-analyzing module of the teacher network is designed, which enables the student network to learn knowledge from different teaching patterns. Furthermore, a stable excitation scheme is proposed for robust supervision for the student training. The proposed distillation mechanism can be embedded into different distillation frameworks, which are taken as baselines. Experiments show the mechanism improves the accuracy by 0.58% on average and by 1.08% in the best over the baselines, which makes its performance superior to the state-of-the-arts. It is also exploited that the student's ability of fine-tuning and robustness to noisy inputs can be improved via the proposed mechanism. The code is available at https://github.com/shaoeric/multi-granularity-distillation.

Comments: 14 pages, 12 figures

Categories: cs.CV

Keywords: knowledge distillation, student network, multi-granularity distillation mechanism, multi-granularity self-analyzing module, teacher network

Related articles: Most relevant | Search more

arXiv:1907.09643 [cs.CV] (Published 2019-07-23)

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

Haoran Zhao, Xin Sun, Junyu Dong, Changrui Chen, Zihe Dong

arXiv:1909.10754 [cs.CV] (Published 2019-09-24)

FEED: Feature-level Ensemble for Knowledge Distillation

SeongUk Park, Nojun Kwak

arXiv:2006.03810 [cs.CV] (Published 2020-06-06)

An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation