February 19, 2024

Visual Question Answering

Key Points

Key points are not available for this paper at this time.

Abstract

Visual question answering (VQA) is an artificial intelligence (AI) and computer vision (CV) comprehensive task to answer questions about the visual content of an image, such as "what color is the bus?" or "how many people are in the photo?" VQA has shown great potential and importance in various domains, ranging from medical imaging applications, autonomous driving, to virtual assistants and search engines. This study develops a framework to tackle VQA research challenges by adopting and extending recent breakthroughs in attention techniques, natural language processing, and image classification models. In addition, different from other previous work that uses static question embedding, we investigate how alternative dynamic embedding models enhance the effectiveness of VQA task. The work is evaluated using the latest developed VQA v2 dataset with a 9% improvement over the results obtained with static word embedding. We also deployed the model as a cloud based VQA system to facilitate VQA tasks in real-life applications.

Demander à l'IA

Bookmark

Cite This Study

Nada et al. (Mon,) studied this question.

synapsesocial.com/papers/68e78a66b6db6435876fd210 https://doi.org/https://doi.org/10.1109/icnc59896.2024.10556344

Demander à l'IA

Bookmark