What type of study is this?

This is a Experimental Study study.

September 28, 2025Open Access

A Low Power Dynamic Bitwidth-Adaptive Multiply Accumulate Unit for Tinyml Accelerators

Key Points

The proposed multiply accumulate unit achieves a power reduction of up to 80% compared to approximate MAC units.
Simulation and synthesis were conducted using the eSim EDA tool and OpenLANE on the 130nm technology node.
The architecture supports mixed-precision modes, enhancing its adaptability for various TinyML workloads.
Zero aware gating and clock gating techniques were employed to optimize energy consumption in TinyML accelerators.

Abstract

With the increasing demand for the deployment of machine learning models on energy-efficient and low-latency devices, TinyML stands out as an efficient solution for enabling intelligence on edge-constrained devices. TinyML workloads often need energy efficient hardware resources for reliable deployment of Machine Learning models. Existing hardware often lacks efficient hardware resources and is unable to perform efficient computations. The Multiply Accumulate Unit (MAC) plays a key role in defining the energy efficiency of the edge-constrained TinyML hardware. To bridge the gap, this work presents a novel architecture: a low power dynamic bit width-adaptive multiply accumulate unit (8-bit) for TinyML Accelerators. This architecture introduces a dynamic, multi-precision, bit width adaptive computational capability, supporting mixed-precision modes such as 2 × 2, 2 × 4, 2 × 8, 4 × 4, 4 × 8 and 8 × 8 with signed × unsigned support, making it highly scalable for TinyML accelerators. In addition, zero aware gating and clock gating are implemented by employing a shift and-add-based multiplier enabling partial product elimination and hybrid carry lookahead adder (CLA) based accumulator enabling dynamic segment-wise activation targeting energy efficiency in TinyML Accelerators. Proposed architecture is simulated and verified on eSim EDA tool and synthesized on the technology node of 130?nm using Google SkyWater’s SKY130 PDK and the open-source EDA toolchain OpenLANE. The proposed Multiply Accumulate Unit reduces power by 59.36%, 68.78%, 74% and 80% when compared to PS4MAC, state-of-the-art (SotA) mixed precision MAC, Synopsys Design Ware MAC (DW) and approximate MAC unit respectively. Compared to prior works, this work stands out as an efficient architecture leading to the growth of energy-efficient TinyML Accelerators.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper