Key points are not available for this paper at this time.
In this paper, we describe the system for generating textual descriptions of video clips using recurrent neural networks (RNN), which we used while in the Large Scale Movie Description Challenge 2015 in ICCV 2015. work builds on static image captioning systems with RNN based language and extends this framework to videos utilizing both static image and video-specific features. In addition, we study the usefulness of content classifiers as a source of additional information for caption. With experimental results we show that utilizing keyframe based, dense trajectory video features and content classifier outputs gives better performance than any one of them individually.
Shetty et al. (Wed,) studied this question.