With the increasing demand for the deployment of machine learning models on energy-efficient and low-latency devices, TinyML stands out as an efficient solution for enabling intelligence on edge-constrained devices. TinyML workloads often need energy efficient hardware resources for reliable deployment of Machine Learning models. Existing hardware often lacks efficient hardware resources and is unable to perform efficient computations. The Multiply Accumulate Unit (MAC) plays a key role in defining the energy efficiency of the edge-constrained TinyML hardware. To bridge the gap, this work presents a novel architecture: a low power dynamic bit width-adaptive multiply accumulate unit (8-bit) for TinyML Accelerators. This architecture introduces a dynamic, multi-precision, bit width adaptive computational capability, supporting mixed-precision modes such as 2 × 2, 2 × 4, 2 × 8, 4 × 4, 4 × 8 and 8 × 8 with signed × unsigned support, making it highly scalable for TinyML accelerators. In addition, zero aware gating and clock gating are implemented by employing a shift and-add-based multiplier enabling partial product elimination and hybrid carry lookahead adder (CLA) based accumulator enabling dynamic segment-wise activation targeting energy efficiency in TinyML Accelerators. Proposed architecture is simulated and verified on eSim EDA tool and synthesized on the technology node of 130?nm using Google SkyWater’s SKY130 PDK and the open-source EDA toolchain OpenLANE. The proposed Multiply Accumulate Unit reduces power by 59.36%, 68.78%, 74% and 80% when compared to PS4MAC, state-of-the-art (SotA) mixed precision MAC, Synopsys Design Ware MAC (DW) and approximate MAC unit respectively. Compared to prior works, this work stands out as an efficient architecture leading to the growth of energy-efficient TinyML Accelerators.
Perika et al. (Mon,) studied this question.