arXiv:2307.13421 Abstract | arXiv Analytics

arXiv:2307.13421 [cs.LG]Abstract References Reviews Resources

On the learning Dynamics of Attention Networks

Published 2023-07-25Version 1

Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets

Comments: Preprint: Accepted at ECAI-2023

Categories: cs.LG, cs.AI

Keywords: attention networks, learning dynamics, focus model, hard attention loss behaves, standard loss functions

Related articles: Most relevant | Search more

arXiv:1905.01320 [cs.LG] (Published 2019-05-03)

Meta-learners' learning dynamics are unlike learners'

Neil C. Rabinowitz

arXiv:2205.14590 [cs.LG] (Published 2022-05-29)

Independent and Decentralized Learning in Markov Potential Games

Chinmay Maheshwari, Manxi Wu, Druv Pai, Shankar Sastry

arXiv:2403.18742 [cs.LG] (Published 2024-03-27)

Understanding the Learning Dynamics of Alignment with Human Feedback

Shawn Im, Yixuan Li

arXiv Analytics

arXiv:2307.13421 [cs.LG]Abstract References Reviews Resources

On the learning Dynamics of Attention Networks

Links

Toolbox

arXiv:2307.13421 [cs.LG]AbstractReferencesReviewsResources

On the learning Dynamics of Attention Networks

Links

Toolbox

arXiv:2307.13421 [cs.LG]Abstract References Reviews Resources