Key points are not available for this paper at this time.
Local space-time features have recently shown promising results within Bag-of-Features (BoF) approach to action recognition in video. Pure local features and descriptors, however, provide only limited discriminative power implying ambiguity among features and sub-optimal classification performance. In this work, we propose to disambiguate local space-time features and to improve action recognition by integrating additional nonlocal cues with BoF representation. For this purpose, we decompose video into region classes and augment local features with corresponding region-class labels. In particular, we investigate unsupervised and supervised video segmentation using (i) motion-based foreground segmentation, (ii) person detection, (iii) static action detection and (iv) object detection. While such segmentation methods might be imperfect, they provide complementary region-level information to local features. We demonstrate how this information can be integrated with BoF representations in a kernel-combination framework. We evaluate our method on the recent and challenging Hollywood-2 action dataset and demonstrate significant improvements.
Building similarity graph...
Analyzing shared references across papers
Loading...
Muhammad Muneeb Ullah
University of the Sciences
Sobhan Naderi Parizi
Google (United States)
Ivan Laptev
Mohamed bin Zayed University of Artificial Intelligence
Building similarity graph...
Analyzing shared references across papers
Loading...
Ullah et al. (Fri,) studied this question.
synapsesocial.com/papers/6a1589765347fbb1739fee72 — DOI: https://doi.org/10.5244/c.24.95
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: