What type of study is this?

This is a Experimental Study study.

September 30, 2025Open Access

ViT-CAAC: Contribution-Aware Adaptive Compression Framework for Vision Transformers

Key Points

ViT-CAAC achieves over 76% reduction in model size with less than 0.4% Top-1 accuracy loss.
The framework integrates block-level knowledge distillation and layer-wise quantization for efficiency.
This research reveals a new approach for deploying high-performance vision models on resource-limited devices.
The implications extend to applications in autonomous systems, IoT, and real-time vision processing.

Abstract

Abstract The Vision Transformer (ViT) model has emerged as a powerful architecture for visual tasks by enabling the capture of long-range dependencies within images, demonstrating superior performance across a variety of applications. However, the large parameter count, along with high computational and memory demands of ViTs pose significant challenges. This paper introduces ViT-CAAC (Contribution-Aware Adaptive Compression Framework), a novel, multi-faceted compression framework designed to optimize ViTs. Our framework integrates block-level knowledge distillation, layer-wise quantization with precision control across hierarchical layers, and adaptive sparsity, creating a cohesive approach that substantially reduces model size while preserving performance. Through rigorous experimentation on benchmark datasets, we demonstrate that our framework achieves over 76% reduction in model size with minimal accuracy degradation (less than 0.4% Top-1 accuracy loss). This work establishes a novel concept for deploying high-performance vision models on resource-limited devices, with implications for applications in autonomous systems, IoT, and real-time vision processing.

ViT-CAAC: Contribution-Aware Adaptive Compression Framework for Vision Transformers

Key Points

Abstract

Cite This Study