arXiv:2107.08579 Abstract | arXiv Analytics

arXiv:2107.08579 [cs.CV]Abstract References Reviews Resources

Action Forecasting with Feature-wise Self-Attention

Published 2021-07-19Version 1

We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary classifier which helps to understand what has happened so far. Then the decoder generates actions for the future based on the output of the recurrent encoder and the self-attention model. Experimentally, we validate each component of our architecture where we see that the impact of self-attention to identify relevant feature dimensions, temporal masking, and observed auxiliary classifier. We evaluate our method on two standard action forecasting benchmarks and obtain state-of-the-art results.

Categories: cs.CV

Keywords: action forecasting, feature-wise self-attention, relevant feature dimensions, recurrent encoder captures temporal information, temporal recurrent encoder captures temporal

Related articles:

arXiv:2210.07354 [cs.CV] (Published 2022-10-13)

Finding Islands of Predictability in Action Forecasting

Daniel Scarafoni, Irfan Essa, Thomas Ploetz

arXiv:1901.03728 [cs.CV] (Published 2019-01-11)

Anticipation and next action forecasting in video: an end-to-end model with memory

Fiora Pirri, Lorenzo Mauro, Edoardo Alati, Valsamis Ntouskos, Mahdieh Izadpanahkakhk, Elham Omrani