July 26, 2024

HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks

Key Points

Key points are not available for this paper at this time.

Abstract

Hybrid models that combine CNNs and ViTs have recently emerged as state-of-the-art computer vision models. To efficiently deploy these hybrid models on resource-constrained mobile/edge devices, quantization is emerging as a promising solution. However, post-training quantization (PTQ), which does not require retraining or labeled data, has not been extensively studied for hybrid models. In this study, we propose a novel PTQ technique specialized for CNN-transformer hybrid models by considering the hardware design of hybrid models on AI accelerators such as GPUs and FPGAs. First, we introduce quantization-aware distribution scaling to address the large outliers caused by inter-channel variance in convolution layers. Furthermore, in the transformer block, we propose approximating the integer-only softmax with a linear function. This approach allows us to avoid costly FP32/INT32 multiplications, resulting in more efficient computations. Experimental results show that the proposed quantization method with INT8 precision demonstrated a 0.39% accuracy drop compared with the FP32 baseline on MobileViT-s with the ImageNet-1k dataset. Furthermore, when implemented on the FPGA platform, the proposed linear softmax achieved significant resource savings, reducing the look-up table and flip-flop usage by 1.8 ~ 2.1x and 1.3 ~ 1.9x, respectively, compared with the existing second-order polynomial approximation. The code is available at https://github.com/IDSL-SeoulTech/HyQ.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Nam Joon Kim

Seoul National University of Science and Technology

Jong-Ho Lee

Korea Institute of Civil Engineering and Building Technology

Hyun Kim

Seoul National University of Science and Technology

Actions

Institutions

Seoul National University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study