arXiv:2311.12678 Abstract | arXiv Analytics

arXiv:2311.12678 [cs.LG]Abstract References Reviews Resources

Interpretation of the Transformer and Improvement of the Extractor

Published 2023-11-21Version 1

It has been over six years since the Transformer architecture was put forward. Surprisingly, the vanilla Transformer architecture is still widely used today. One reason is that the lack of deep understanding and comprehensive interpretation of the Transformer architecture makes it more challenging to improve the Transformer architecture. In this paper, we first interpret the Transformer architecture comprehensively in plain words based on our understanding and experiences. The interpretations are further proved and verified. These interpretations also cover the Extractor, a family of drop-in replacements for the multi-head self-attention in the Transformer architecture. Then, we propose an improvement on a type of the Extractor that outperforms the self-attention, without introducing additional trainable parameters. Experimental results demonstrate that the improved Extractor performs even better, showing a way to improve the Transformer architecture.

Categories: cs.LG

Keywords: interpretation, improvement, vanilla transformer architecture, experimental results demonstrate, drop-in replacements

Related articles: Most relevant | Search more

arXiv:1704.05041 [cs.LG] (Published 2017-04-17)

Fast multi-output relevance vector regression

Youngmin Ha

arXiv:1905.12916 [cs.LG] (Published 2019-05-30)

Effective Medical Test Suggestions Using Deep Reinforcement Learning

Yang-En Chen, Kai-Fu Tang, Yu-Shao Peng, Edward Y. Chang

arXiv:1907.06065 [cs.LG] (Published 2019-07-13)

Bringing Giant Neural Networks Down to Earth with Unlabeled Data

Yehui Tang, Shan You, Chang Xu, Boxin Shi, Chao Xu

arXiv Analytics

arXiv:2311.12678 [cs.LG]Abstract References Reviews Resources

Interpretation of the Transformer and Improvement of the Extractor

Links

Toolbox

arXiv:2311.12678 [cs.LG]AbstractReferencesReviewsResources

Interpretation of the Transformer and Improvement of the Extractor

Links

Toolbox

arXiv:2311.12678 [cs.LG]Abstract References Reviews Resources