arXiv:1911.09074 Abstract | arXiv Analytics

arXiv:1911.09074 [cs.CV]Abstract References Reviews Resources

Search to Distill: Pearls are Everywhere but not the Eyes

Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

Published 2019-11-20Version 1

Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture. However, the knowledge of a neural network, which is represented by the network's output distribution conditioned on its input, depends not only on its parameters but also on its architecture. Hence, a more generalized approach for KD is to distill the teacher's knowledge into both the parameters and architecture of the student. To achieve this, we present a new Architecture-aware Knowledge Distillation (AKD) approach that finds student models (pearls for the teacher) that are best for distilling the given teacher model. In particular, we leverage Neural Architecture Search (NAS), equipped with our KD-guided reward, to search for the best student architectures for a given teacher. Experimental results show our proposed AKD consistently outperforms the conventional NAS plus KD approach, and achieves state-of-the-art results on the ImageNet classification task under various latency settings. Furthermore, the best AKD student architecture for the ImageNet classification task also transfers well to other tasks such as million level face recognition and ensemble learning.

Categories: cs.CV, cs.LG

Keywords: imagenet classification task, conventional nas plus kd approach, teacher model, best akd student architecture, knowledge distillation

Related articles: Most relevant | Search more

arXiv:2207.10425 [cs.CV] (Published 2022-07-21)

KD-MVS: Knowledge Distillation Based Self-supervised Learning for MVS

Yikang Ding, Qingtian Zhu, Xiangyue Liu, Wentao Yuan, Haotian Zhang, CHi Zhang

arXiv:2304.01029 [cs.CV] (Published 2023-04-03)

Domain Generalization for Crop Segmentation with Knowledge Distillation

Simone Angarano, Mauro Martini, Alessandro Navone, Marcello Chiaberge

arXiv:2011.01424 [cs.CV] (Published 2020-11-03)

In Defense of Feature Mimicking for Knowledge Distillation

Guo-Hua Wang, Yifan Ge, Jianxin Wu