Key points are not available for this paper at this time.
Hybrid models that combine CNNs and ViTs have recently emerged as state-of-the-art computer vision models. To efficiently deploy these hybrid models on resource-constrained mobile/edge devices, quantization is emerging as a promising solution. However, post-training quantization (PTQ), which does not require retraining or labeled data, has not been extensively studied for hybrid models. In this study, we propose a novel PTQ technique specialized for CNN-transformer hybrid models by considering the hardware design of hybrid models on AI accelerators such as GPUs and FPGAs. First, we introduce quantization-aware distribution scaling to address the large outliers caused by inter-channel variance in convolution layers. Furthermore, in the transformer block, we propose approximating the integer-only softmax with a linear function. This approach allows us to avoid costly FP32/INT32 multiplications, resulting in more efficient computations. Experimental results show that the proposed quantization method with INT8 precision demonstrated a 0.39% accuracy drop compared with the FP32 baseline on MobileViT-s with the ImageNet-1k dataset. Furthermore, when implemented on the FPGA platform, the proposed linear softmax achieved significant resource savings, reducing the look-up table and flip-flop usage by 1.8 ~ 2.1x and 1.3 ~ 1.9x, respectively, compared with the existing second-order polynomial approximation. The code is available at https://github.com/IDSL-SeoulTech/HyQ.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nam Joon Kim
Seoul National University of Science and Technology
Jong-Ho Lee
Korea Institute of Civil Engineering and Building Technology
Hyun Kim
Seoul National University of Science and Technology
Seoul National University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Kim et al. (Fri,) studied this question.
synapsesocial.com/papers/68e5ef86b6db64358758434c — DOI: https://doi.org/10.24963/ijcai.2024/474