February 21, 2024

Efficient Neural Network Compression through Booth Coding and Exponential of Two Quantization for Enhanced Inference Performance

Key Points

Key points are not available for this paper at this time.

Abstract

In the contemporary landscape, the pervasive deployment of AI applications is a global phenomenon. We propose a pioneering method for the compression of neural network weight sizes, addressing the imperative of reducing computational demands during network inference without compromising classification accuracy. Our suggested re-coding technique leverages Booth coding in conjunction with Exponential of Two quantization. The efficacy of this approach is substantiated through its application to convolutional neural networks (ConvNets) and fully connected neural networks (FNNs). Furthermore, we introduce extended precision multipliers and the exponential of two multipliers in the context of this methodology. By employing our quantization and recoding method, we achieve a substantial reduction in the size of ConvNets by 31.87%, while the dimensions of the fully connected network model are diminished by 48.96%. Importantly, the application of these techniques results in a remarkable 91.01% reduction in predictive power for ConvNets and a 51.6% reduction for fully connected networks. Notably, our proposed approach is amenable to optimizing power efficiency during network inference, especially in the case of Booth multipliers, with minor adjustments to the signal encoding configuration. Furthermore, we elucidate that the synchronization and re-encoding processes applied across all models have a negligible impact on neural network accuracy, underpinning the program's integrity.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

V. A. Ashik Vijay

Institute of Power Engineering

S Abhishek

National Aerospace Laboratories

K Arunkumar

Saveetha University

Actions

Institutions

Saveetha University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Efficient Neural Network Compression through Booth Coding and Exponential of Two Quantization for Enhanced Inference Performance

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study