What does this research mean for the field?

The Spatial and Language-Temporal Attention (SLTA) method, which leverages object-level local features and dual attention mechanisms, achieves superior performance in cross-modal video moment retrieval compared to existing state-of-the-art methods. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

June 5, 2019

Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention

Key Points

Key points are not available for this paper at this time.

Abstract

Given an untrimmed video and a description query, temporal moment retrieval aims to localize the temporal segment within the video that best describes the textual query. Existing studies predominantly employ coarse frame-level features as the visual representation, obfuscating the specific details which may provide critical cues for localizing the desired moment. We propose a SLTA (short for "Spatial and Language-Temporal Attention") method to address the detail missing issue. Specifically, the SLTA method takes advantage of object-level local features and attends to the most relevant local features (e.g., the local features "girl", "cup") by spatial attention. Then we encode the sequence of local features on consecutive frames to capture the interaction information among these objects (e.g., the interaction "pour" involving these two objects). Meanwhile, a language-temporal attention is utilized to emphasize the keywords based on moment context information. Therefore, our proposed two attention sub-networks can recognize the most relevant objects and interactions in the video, and simultaneously highlight the keywords in the query. Extensive experiments on TACOS, Charades-STA and DiDeMo datasets demonstrate the effectiveness of our model as compared to state-of-the-art methods.

KI fragen

Bookmark

Cite This Study

Jiang et al. (Wed,) studied this question.

synapsesocial.com/papers/6a0ed7bf1c5e2d2319f9f5ae https://doi.org/https://doi.org/10.1145/3323873.3325019

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark