arXiv:1711.04323 Abstract | arXiv Analytics

arXiv:1711.04323 [cs.CV]Abstract References Reviews Resources

High-Order Attention Models for Visual Question Answering

Idan Schwartz, Alexander G. Schwing, Tamir Hazan

Published 2017-11-12Version 1

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

Comments: 9 pages, 8 figures, NIPS 2017

Categories: cs.CV, cs.AI, cs.LG

Keywords: visual question answering, high-order attention models, data modalities, learns high-order correlations, high-order correlations effectively direct

Related articles: Most relevant | Search more

arXiv:1704.08243 [cs.CV] (Published 2017-04-26)

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh

arXiv:2001.08730 [cs.CV] (Published 2020-01-23)

Robust Explanations for Visual Question Answering

Badri N. Patro, Shivansh Pate, Vinay P. Namboodiri

arXiv:1902.09487 [cs.CV] (Published 2019-02-25)

MUREL: Multimodal Relational Reasoning for Visual Question Answering