Key points are not available for this paper at this time.
The Deep Neural Network (DNN) model has been used in a number of commercial applications and we benefit from its accuracy in numerous applications like virtual assistants and chatbots. Due to the high computational demands and significant memory requirements of those models, quantization approaches have been employed to minimise accuracy loss while reducing model size to address those problems. DNN’s has a number of problems, including a large model size and a high accuracy model, which have come at the cost of substantially increased computation and model storage resources, which consume more power. The study addresses these issues and focuses on enhancing model speed, reducing computational cost, compressing the size of the model, and making the model energy efficient by using some of the methods outlined below. We can use quantization techniques to accomplish our desired goals. These techniques are broadly categorised as quantification-aware training and post-training quantification. The former technique discusses full quan-tization and batch normalization, whereas the latter technique also discusses the weights, activations, weights and activations together for quantization.
Kulkarni et al. (Tue,) studied this question.