Key points are not available for this paper at this time.
In this paper, we present a one-stage framework TriDet for temporal action detection. Existing methods often suffer from imprecise boundary predictions due to the ambiguous action boundaries in videos. To alleviate this problem, we propose a novel Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. In the feature pyramid of TriDet, we propose an efficient Scalable-Granularity Perception (SGP) layer to mitigate the rank loss problem of self-attention that takes place in the video features and aggregate information across different temporal granularities. Benefiting from the Trident-head and the SGP-based feature pyramid, TriDet achieves state-of-the-art performance on three challenging benchmarks: THUMOS14, HACS and EPIC-KITCHEN 100, with lower computational costs, compared to previous methods. For example, TriDet hits an average mAP of 69.3% on THUMOS14, outperforming the previous best by 2.5%, but with only 74.6% of its latency. The code is released to https://github.com/dingfengshi/TriDet.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dingfeng Shi
Yujie Zhong
Qiong Cao
Beihang University
Building similarity graph...
Analyzing shared references across papers
Loading...
Shi et al. (Thu,) studied this question.
www.synapsesocial.com/papers/6a006017ef8139f8ff778dfb — DOI: https://doi.org/10.1109/cvpr52729.2023.01808