Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Key Points

Key points are not available for this paper at this time.

Abstract

The rising popularity of intelligent mobile devices and the daunting cost of deep learning-based models call for efficient and on-device inference schemes. We propose a quantization scheme that inference to be carried out using integer-only arithmetic, which can be more efficiently than floating point inference on commonly integer-only hardware. We also co-design a training procedure to end-to-end model accuracy post quantization. As a result, the proposed scheme improves the tradeoff between accuracy and on-device. The improvements are significant even on MobileNets, a model family for run-time efficiency, and are demonstrated in ImageNet classification COCO detection on popular CPUs.

Bookmark

View Full Paper

Bookmark

View Full Paper

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Key Points

Abstract

Cite This Study