June 19, 2018Open Access

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Key Points

Key points are not available for this paper at this time.

Abstract

Visual Question answering is a challenging problem requiring a combination of from Computer Vision and Natural Language Processing. Most existing use a two streams strategy, computing image and question features are consequently merged using a variety of techniques. Nonetheless, very rely on higher level image representations, which can capture semantic and relationships. In this paper, we propose a novel graph-based approach Visual Question Answering. Our method combines a graph learner module, learns a question specific graph representation of the input image, with recent concept of graph convolutions, aiming to learn image representations capture question specific interactions. We test our approach on the VQA v2 using a simple baseline architecture enhanced by the proposed graph module. We obtain promising results with 66. 18% accuracy and the interpretability of the proposed method. Code can be found at. com/aimbrain/vqa-project.

Bookmark

View Full Paper

Bookmark

View Full Paper

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Key Points

Abstract

Cite This Study