Key points are not available for this paper at this time.
Traditional digital implementations of neural accelerators are limited by high power and area overheads, while analog and non-CMOS implementations suffer from noise, device mismatch, and reliability issues. This paper introduces a CMOS Look-Up Table (LUT) -based Neural Accelerator (LUT-NA) framework that reduces the power, latency, and area consumption of traditional digital accelerators through pre-computed, faster look-ups while avoiding noise and mismatch of analog circuits. To solve the scalability issues of conventional LUT-based computation, we split the high-precision multiply and accumulate (MAC) operations into lower-precision MACs using a divide-and-conquer-based approach. We show that LUT-NA achieves up to 29. 54 lower area with 3. 34 lower energy per inference task than traditional LUT-based techniques and up to 1. 23 lower area with 1. 80 lower energy per inference task than conventional digital MAC-based techniques (Wallace Tree/Array Multipliers) without retraining and without affecting accuracy, even on lottery ticket pruned (LTP) models that already reduce the number of required MAC operations by up to 98%. Finally, we introduce mixed precision analysis in LUT-NA framework for various LTP models (VGG11, VGG19, Resnet18, Resnet34, GoogleNet) that achieved up to 32. 22-50. 95 lower area across models with 3. 68-6. 25 lower energy per inference than traditional LUT-based techniques, and up to 1. 35-2. 14 lower area requirement with 1. 99-3. 38 lower energy per inference across models as compared to conventional digital MAC-based techniques with 1% accuracy loss.
Sen et al. (Fri,) studied this question.