Abstract As neural network models continue to grow in scale and complexity, specialized hardware accelerators have emerged to meet the increased demand for compute and memory. These accelerators employ a wide range of architectural innovations, making it challenging to perform fair comparisons and isolate the impact of specific design decisions. Traditional evaluation metrics, such as tera operations per second (TOPS) and TOPS per watt (TOPS/W), are heavily influenced by external factors such as technology node, clock frequency, the scale of the design and workload variations, limiting their effectiveness for meaningful analysis. In this work, we propose a methodology for benchmarking neural network accelerators using Voyager, a high-level synthesis (HLS)-based accelerator generator. Voyager enables the creation of baseline accelerators matched in compute scale and technology node capable of running identical workloads. This enables fair, apples-to-apples comparisons across diverse accelerator architectures. We showcase this methodology with a range of case studies, including those on technology scaling and involving comparisons with state-of-the-art digital accelerators, in-memory computing-based accelerators, and sparsity-aware designs. Our results demonstrate that Voyager-generated designs serve as well-optimized baselines that enable systematic evaluation of accelerator architectures. This article is part of the discussion meeting issue ‘Bits, neurons and qubits for sustainable AI’.
Prabhu et al. (Thu,) studied this question.