Key points are not available for this paper at this time.
In the contemporary landscape, the pervasive deployment of AI applications is a global phenomenon. We propose a pioneering method for the compression of neural network weight sizes, addressing the imperative of reducing computational demands during network inference without compromising classification accuracy. Our suggested re-coding technique leverages Booth coding in conjunction with Exponential of Two quantization. The efficacy of this approach is substantiated through its application to convolutional neural networks (ConvNets) and fully connected neural networks (FNNs). Furthermore, we introduce extended precision multipliers and the exponential of two multipliers in the context of this methodology. By employing our quantization and recoding method, we achieve a substantial reduction in the size of ConvNets by 31.87%, while the dimensions of the fully connected network model are diminished by 48.96%. Importantly, the application of these techniques results in a remarkable 91.01% reduction in predictive power for ConvNets and a 51.6% reduction for fully connected networks. Notably, our proposed approach is amenable to optimizing power efficiency during network inference, especially in the case of Booth multipliers, with minor adjustments to the signal encoding configuration. Furthermore, we elucidate that the synchronization and re-encoding processes applied across all models have a negligible impact on neural network accuracy, underpinning the program's integrity.
Building similarity graph...
Analyzing shared references across papers
Loading...
V. A. Ashik Vijay
Institute of Power Engineering
S Abhishek
National Aerospace Laboratories
K Arunkumar
Saveetha University
Saveetha University
Building similarity graph...
Analyzing shared references across papers
Loading...
Vijay et al. (Wed,) studied this question.
synapsesocial.com/papers/68e785a8b6db6435876f8376 — DOI: https://doi.org/10.1109/iciptm59628.2024.10563484