March 18, 2024Open Access

Bandwidth-Efficient Inference for Nerual Image Compression

Key Points

Key points are not available for this paper at this time.

Abstract

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottle-neck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19× bandwidth reduction with 6. 21× energy saving. The code implementation is available at https: //github. com/xyzysz/Bandwidthₑfficientₙic.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Yin et al. (Mon,) studied this question.

synapsesocial.com/papers/68e7398bb6db6435876b2bd5 — DOI: https://doi.org/10.1109/icassp48485.2024.10446809

Authors

Shanzhi Yin

City University of Hong Kong

Tongda Xu

University of California, Riverside

Yongsheng Liang

Shenzhen Institute of Information Technology

Actions

Institutions

Tsinghua University

Harbin Institute of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Bandwidth-Efficient Inference for Nerual Image Compression

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion