Key points are not available for this paper at this time.
The rising popularity of intelligent mobile devices and the daunting cost of deep learning-based models call for efficient and on-device inference schemes. We propose a quantization scheme that inference to be carried out using integer-only arithmetic, which can be more efficiently than floating point inference on commonly integer-only hardware. We also co-design a training procedure to end-to-end model accuracy post quantization. As a result, the proposed scheme improves the tradeoff between accuracy and on-device. The improvements are significant even on MobileNets, a model family for run-time efficiency, and are demonstrated in ImageNet classification COCO detection on popular CPUs.
Jacob et al. (Fri,) studied this question.