What type of study is this?

This is a Experimental Study study.

September 30, 2025Open Access

ViT-CAAC: Contribution-Aware Adaptive Compression Framework for Vision Transformers

Key Points

ViT-CAAC achieves over 76% reduction in model size with less than 0.4% Top-1 accuracy loss.
The framework integrates block-level knowledge distillation and layer-wise quantization for efficiency.
This research reveals a new approach for deploying high-performance vision models on resource-limited devices.
The implications extend to applications in autonomous systems, IoT, and real-time vision processing.

Abstract

Abstract The Vision Transformer (ViT) model has emerged as a powerful architecture for visual tasks by enabling the capture of long-range dependencies within images, demonstrating superior performance across a variety of applications. However, the large parameter count, along with high computational and memory demands of ViTs pose significant challenges. This paper introduces ViT-CAAC (Contribution-Aware Adaptive Compression Framework), a novel, multi-faceted compression framework designed to optimize ViTs. Our framework integrates block-level knowledge distillation, layer-wise quantization with precision control across hierarchical layers, and adaptive sparsity, creating a cohesive approach that substantially reduces model size while preserving performance. Through rigorous experimentation on benchmark datasets, we demonstrate that our framework achieves over 76% reduction in model size with minimal accuracy degradation (less than 0.4% Top-1 accuracy loss). This work establishes a novel concept for deploying high-performance vision models on resource-limited devices, with implications for applications in autonomous systems, IoT, and real-time vision processing.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yu Zhang

Shanxi Agricultural University

Suping Peng

China University of Mining and Technology

Yao Xiao

Guangzhou University of Chinese Medicine

Actions

Institutions

Tsinghua University

Beijing Academy of Artificial Intelligence

Shanghai Artificial Intelligence Laboratory

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

ViT-CAAC: Contribution-Aware Adaptive Compression Framework for Vision Transformers

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study