A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

Key Points

Key points are not available for this paper at this time.

Abstract

With the rapid advancement of artificial intelligence (AI), computing-in-memory (CIM) structure is proposed to improve energy efficiency (EF). However, previous CIMs often rely on INT8 data types, which pose challenges when addressing more complex networks, larger datasets, and increasingly intricate tasks. This work presents a double-bit 6T static random-access memory (SRAM) -based floating-point CIM macro using: 1) a cell array with double-bitcells (DBcells) and floating-point computing units (FCUs) to improve throughput without the sacrifice of inference accuracy; 2) an FCU with high-bit full-precision multiply cell (HFMC) and low-bit approximate-calculation multiply cell (LAMC) to reduce internal bandwidth and area cost; 3) a CIM macro architecture with FP processing circuits to support both floating-point MAC (FP-MAC) and integer (INT) -multiplication and accumulation (MAC) ; 4) a new ShareFloatv2 data type to map floating point in CIM array; and 5) a lookup table (LUT) -based Tensorflow training method to improve inference accuracy. A fabricated 28-nm 64-kb digital-domain SRAM-CIM macro achieved the best EF (31. 6 TFLOPS/W) and the highest area efficiency (2. 05 TFLOPS/mm ^2) for FP-MAC with Brain Float16 (BF16) IN/W/OUT on three AI tasks: classification@CIFAR100, detection@COCO, and segmentation@VOC2012.

Bookmark

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

Key Points

Abstract

Cite This Study