arXiv:2301.13310 Abstract | arXiv Analytics

arXiv:2301.13310 [cs.LG]Abstract References Reviews Resources

Alternating Updates for Efficient Transformers

Cenk Baykal, Dylan Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

Published 2023-01-30Version 1

It is well established that increasing scale in deep transformer networks leads to improved quality and performance. This increase in scale often comes with an increase in compute cost and inference latency. Consequently, research into methods which help realize the benefits of increased scale without leading to an increase in the compute cost becomes important. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation without increasing the computation time by working on a subblock of the representation at each layer. Our experiments on various transformer models and language tasks demonstrate the consistent effectiveness of alternating updates on a diverse set of benchmarks. Finally, we present extensions of AltUp to the sequence dimension, and demonstrate how AltUp can be synergistically combined with existing approaches, such as Sparse Mixture-of-Experts models, to obtain efficient models with even higher capacity.

Categories: cs.LG, cs.CL

Keywords: alternating updates, efficient transformers, language tasks demonstrate, deep transformer networks, sparse mixture-of-experts models

Related articles: Most relevant | Search more

arXiv:2009.06732 [cs.LG] (Published 2020-09-14)

Efficient Transformers: A Survey

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

arXiv:2501.10714 [cs.LG] (Published 2025-01-18)

FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models

Xinglin Pan et al.

arXiv:2402.07545 [cs.LG] (Published 2024-02-12, updated 2025-05-07)

TransAxx: Efficient Transformers with Approximate Computing

Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel

arXiv Analytics

arXiv:2301.13310 [cs.LG]Abstract References Reviews Resources

Alternating Updates for Efficient Transformers

Links

Toolbox

arXiv:2301.13310 [cs.LG]AbstractReferencesReviewsResources

Alternating Updates for Efficient Transformers

Links

Toolbox

arXiv:2301.13310 [cs.LG]Abstract References Reviews Resources