Temporal action detection (TAD) is a challenging task in the field of video understanding. We determine the semantic labels and precise boundaries of each action instance in an untrimmed video. Over the years, a variety of networks have been proposed, including convolution, graph, and transformer, which have been effectively applied in TAD tasks. Most of the methods have been able to identify the action category well; however, the accuracy of determining the action boundary is still insufficient. Because an action contains several consecutive frames of similar images, we recommend picking out the key frames in the video sequence and enhancing the TAD representation by extracting additional features of the key frames. We propose KeyMamba, a state-space model-based learnable network for TAD tasks. The proposed model applies a bidirectional Mamba block to capture global features efficiently. We also added a temporal deformable attention module to extract key frame features from video clips. These features contain the information of motion changes, and the key frame features complement the global features, which can identify the video action boundaries more accurately. In addition, to get a higher quality Token in the spatial dimension, we added an attention mask before the bidirectional Mamba block encoder. Finally, we also apply masking operations during the forward and backward scanning processes within the bidirectional Mamba block to mitigate the impact of duplicate tokens. Our experiments have achieved outstanding performance on the THUMOS14 and ActivityNet-1.3 datasets, reaching an average mAP of 70.4 on THUMOS14 and an average mAP of 38.44 on ActivityNet-1.3.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zikai Chen
Dan Wei
Peixing Li
Journal of Electronic Imaging
Shanghai University of Engineering Science
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Sat,) studied this question.
www.synapsesocial.com/papers/68d90bc941e1c178a14f733c — DOI: https://doi.org/10.1117/1.jei.34.5.053022
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: