arXiv Analytics

Sign in

arXiv:2105.03343 [cs.LG]AbstractReferencesReviewsResources

Adapting by Pruning: A Case Study on BERT

Yang Gao, Nicolo Colombo, Wei Wang

Published 2021-05-07Version 1

Adapting pre-trained neural models to downstream tasks has become the standard practice for obtaining high-quality models. In this work, we propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task; all remaining connections have their weights intact. We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model. We prove that the algorithm is near-optimal under standard assumptions and apply the algorithm to adapt BERT to some GLUE tasks. Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model. We also compare our method with other state-of-the-art pruning methods and study the topological differences of their obtained sub-networks.

Related articles: Most relevant | Search more
arXiv:2011.06485 [cs.LG] (Published 2020-11-12)
Fairness and Robustness in Invariant Learning: A Case Study in Toxicity Classification
arXiv:1806.07129 [cs.LG] (Published 2018-06-19)
Instance-Level Explanations for Fraud Detection: A Case Study
arXiv:1810.05524 [cs.LG] (Published 2018-10-10)
Introducing a hybrid model of DEA and data mining in evaluating efficiency. Case study: Bank Branches