Key points are not available for this paper at this time.
Emerging artificial intelligence-enabled Internet-of-Things (AI-IoT) system-on-chip (SoC) for augmented reality, personalized healthcare, and nanorobotics need to run many diverse tasks within a power envelope of a few tens of mW over a wide range of operating conditions: compute-intensive but strongly quantized deep neural network (DNN) inference, as well as signal processing and control requiring high-precision floating point. We present MARSELLUS, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22 nm FDX that combines: 1) a general-purpose cluster of 16 RISC-V digital signal processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4- and 2-bit arithmetic extensions (XpulpNN), combined with fused multiply accumulate (MAC) and LOAD operations and floating-point support; 2) a 2–8 bit reconfigurable binary engine (RBE) to accelerate A3 3 and A1 1 (pointwise) convolutions in DNNs; 3) a set of ON-chip monitoring (OCM) blocks connected to an adaptive body biasing (ABB) generator and a hardware control loop, enabling on- the-fly adaptation of transistor threshold voltages. MARSELLUS achieves up to 180 Gop/s or 3. 32 Top/s/W on 2-bit precision arithmetic in software, and up to 637 Gop/s or 12. 4 Top/s/W on hardware-accelerated DNN layers.
Conti et al. (Tue,) studied this question.