May 30, 2024Open Access

QAVidCap: Enhancing Video Captioning through Question Answering Techniques

Key Points

Key points are not available for this paper at this time.

Abstract

Video captioning is the task of describing video content using natural sentences. While recent models have shown significant improvements in metrics, there are still some unresolved issues. Model-generated captions often contain factual errors and omit important details. In contrast, human-written captions excel in accurately and comprehensively describing the video content. In this work, we propose a novel method that utilizes question answering (QA) techniques to enhance video captioning models. We start by generating QA pairs from both videos and human-written captions. We propose a QA-enhanced captioning model to better leverage QA information. Finally, we employ reinforcement learning to train the model to maximize a QA reward. By incorporating QA-related techniques, our model can generate more accurate and comprehensive video captions. We conduct experiments on three datasets, namely ActivityNet Captions, YouCookII and MSR-VTT. The experimental results, ablation studies and human evaluations demonstrate the advantages of our method.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper

Cite This Study

Liu et al. (Thu,) studied this question.

synapsesocial.com/papers/68e67b96b6db643587605053 https://doi.org/https://doi.org/10.1145/3652583.3658061

Ask AI

Helpful

Bookmark

View Full Paper