arXiv:1912.00544 Abstract | arXiv Analytics

arXiv:1912.00544 [cs.CL]Abstract References Reviews Resources

Multi-Scale Self-Attention for Text Classification

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang

Published 2019-12-02Version 1

In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.

Comments: Accepted in AAAI2020

Categories: cs.CL, cs.LG

Keywords: text classification, multi-scale self-attention, multi-scale multi-head self-attention, multi-scale transformer outperforms, capture features

Related articles: Most relevant | Search more

arXiv:2102.09507 [cs.CL] (Published 2021-02-18)

Regular Expressions for Fast-response COVID-19 Text Classification

Igor L. Markov, Jacqueline Liu, Adam Vagner

arXiv:2006.16174 [cs.CL] (Published 2020-06-29)

Multichannel CNN with Attention for Text Classification

Zhenyu Liu, Haiwei Huang, Chaohong Lu, Shengfei Lyu

arXiv:2006.15315 [cs.CL] (Published 2020-06-27)

Uncertainty-aware Self-training for Text Classification with Few Labels