arXiv:2402.10198 Abstract | arXiv Analytics

arXiv:2402.10198 [cs.LG]Abstract References Reviews Resources

Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko

Published 2024-02-15, updated 2024-02-19Version 2

Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by 14.33% on average, while having ~4 times fewer parameters. The code is available at https://github.com/romilbert/samformer.

Comments: The first two authors contributed equally

Categories: cs.LG, stat.ML

Keywords: time series forecasting, escapes bad local minima, sharpness-aware minimization, transformer, channel-wise attention

Related articles: Most relevant | Search more

arXiv:1907.00235 [cs.LG] (Published 2019-06-29)

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, Xifeng Yan

arXiv:1910.09620 [cs.LG] (Published 2019-10-21)

You May Not Need Order in Time Series Forecasting

Yunkai Zhang, Qiao Jiang, Shurui Li, Xiaoyong Jin, Xueying Ma, Xifeng Yan

arXiv:2009.09110 [cs.LG] (Published 2020-09-18)

Explainable boosted linear regression for time series forecasting