Key points are not available for this paper at this time.
Visual Question answering is a challenging problem requiring a combination of from Computer Vision and Natural Language Processing. Most existing use a two streams strategy, computing image and question features are consequently merged using a variety of techniques. Nonetheless, very rely on higher level image representations, which can capture semantic and relationships. In this paper, we propose a novel graph-based approach Visual Question Answering. Our method combines a graph learner module, learns a question specific graph representation of the input image, with recent concept of graph convolutions, aiming to learn image representations capture question specific interactions. We test our approach on the VQA v2 using a simple baseline architecture enhanced by the proposed graph module. We obtain promising results with 66. 18% accuracy and the interpretability of the proposed method. Code can be found at. com/aimbrain/vqa-project.
Norcliffe-Brown et al. (Tue,) studied this question.