What question did this study set out to answer?

This research investigates structural redundancy in vision transformers to enhance efficiency for edge devices.

June 17, 2026

Research on Static Structured Pruning of Vision Transformers Based on L1 Norm and Uniform Layer-wise Constraint: A Case Study of DeiT

Key Points

This research investigates structural redundancy in vision transformers to enhance efficiency for edge devices.
Proposed a static structured pruning method using L1 norm and uniform layer-wise constraint
Focused on the DeiT-small model for fine-grained classification tasks
Conducted experiments on CIFAR-100 dataset to measure impacts of pruning attention heads.
Removing 1 attention head per layer reduced parameters by 5.45% with 86.12% accuracy
Pruning 3 heads per layer decreased parameters by 16.34% while maintaining 82.08% accuracy
Findings quantify the redundancy of attention heads in vision transformers.

Abstract

Vision Transformers (e.g., DeiT) have demonstrated exceptional performance in image classification tasks, yet their massive parameter counts heavily limit their deployment on resource-constrained edge devices. Focusing on the fine-grained image classification task (CIFAR-100), this paper investigates the structural redundancy within the multi-head attention mechanism of the DeiT-small model. We propose a static structured pruning method based on the L1 norm combined with a uniform layer-wise constraint. This approach evaluates the importance of the output projection weights of attention heads statically and removes redundant heads uniformly across each Transformer layer, effectively preventing the tensor dimension mismatch that occurs when all heads in a single layer are pruned. Experimental results indicate that removing 1 attention head per layer (16.7% globally) reduces the parameter count by 5.45% (down to 20.52M), while the post-finetuning accuracy reaches 86.12%. When the pruning ratio is scaled to 3 heads per layer (50.0% globally), the parameters are reduced by 16.34% (down to 18.16M), and the accuracy is maintained at 82.08%. This study successfully quantifies the redundancy boundary of attention heads in DeiT for fine-grained tasks, providing an empirical reference for model lightweighting.

Bookmark

View Full Paper

Bookmark

View Full Paper

Research on Static Structured Pruning of Vision Transformers Based on L1 Norm and Uniform Layer-wise Constraint: A Case Study of DeiT

Key Points

Abstract

Cite This Study