Key points are not available for this paper at this time.
Visual question answering (VQA) is an artificial intelligence (AI) and computer vision (CV) comprehensive task to answer questions about the visual content of an image, such as "what color is the bus?" or "how many people are in the photo?" VQA has shown great potential and importance in various domains, ranging from medical imaging applications, autonomous driving, to virtual assistants and search engines. This study develops a framework to tackle VQA research challenges by adopting and extending recent breakthroughs in attention techniques, natural language processing, and image classification models. In addition, different from other previous work that uses static question embedding, we investigate how alternative dynamic embedding models enhance the effectiveness of VQA task. The work is evaluated using the latest developed VQA v2 dataset with a 9% improvement over the results obtained with static word embedding. We also deployed the model as a cloud based VQA system to facilitate VQA tasks in real-life applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nada et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e78a66b6db6435876fd210 — DOI: https://doi.org/10.1109/icnc59896.2024.10556344
Ahmed Nada
Min Chen
University of Washington Bothell
Building similarity graph...
Analyzing shared references across papers
Loading...