February 13, 2021

4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization

Key Points

Key points are not available for this paper at this time.

Abstract

Computing-in-memory (CIM) improves energy efficiency by enabling parallel multiply-and-accumulate (MAC) operations and reducing memory accesses 1-4. However, today's typical neural networks (NNs) usually exceed on-chip memory capacity. Thus, a CIM-based processor may encounter a memory bottleneck 5. Tensor-train (TT) is a tensor decomposition method, which decomposes a d-dimensional tensor to d 4D tensor-cores (TCs: G k r k-1 , n k , m k , r k , k = 1, ... , d) 6. G k can be viewed as a 2D n k× m k array, where each element is an r k-1 ×r k matrix. The TCs require Σ k∈1,d r k-1 n k m k r k parameters to represent the original tensor, which has Π k∈1,d n k m k parameters. Since rk is typically small, kernels and weight matrices of convolutional, fully-connected and recurrent layers can be compressed significantly by using TT decomposition, thereby enabling storage of an entire NN in a CIM-based processor.

Bookmark

4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization

Key Points

Abstract

Cite This Study