Los puntos clave no están disponibles para este artículo en este momento.
Localizing moments in a longer video via natural language queries is a new, challenging task at the intersection of language and video understanding. Though moment localization with natural language is similar to other language and vision tasks like natural language object retrieval in images, moment localization offers an interesting opportunity to model temporal dependencies and reasoning in text.
Hendricks et al. (Mon,) studied this question.