April 22, 2024

CMC: Video Transformer Acceleration via CODEC Assisted Matrix Condensing

Key Points

Key points are not available for this paper at this time.

Abstract

Video Transformers (VidTs) have reached the forefront of accuracy in various video understanding tasks. Despite their remarkable achievements, the processing requirements for a large number of video frames still present a significant performance bottleneck, impeding their deployment to resource-constrained platforms. While accelerators meticulously designed for Vision Transformers (ViTs) have emerged, they may not be the optimal solution for VidTs, primarily due to two reasons. These accelerators tend to overlook the inherent temporal redundancy that characterizes VidTs, limiting their chance for further performance enhancement. Moreover, incorporating a sparse attention prediction module within these accelerators incurs a considerable overhead.

Demander à l'IA

Bookmark

Cite This Study

Song et al. (Mon,) studied this question.

synapsesocial.com/papers/68e6e1dcb6db64358765d60e https://doi.org/https://doi.org/10.1145/3620665.3640393

Demander à l'IA

Bookmark