Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers). As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously achieve high efficiency and broad application scope. While efficient computational primitives are important for a hardware accelerator, inefficient memory transfers can potentially void the throughput, energy, or cost advantages of accelerators, that is, an Amdahl's law effect, and thus, they should become a first-order concern, just like in processors, rather than an element factored in accelerator design on a second step. In this article, we introduce a series of hardware accelerators (i.e., the DianNao family) designed for ML (especially neural networks), with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that, on a number of representative neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip DaDianNao system (a member of the DianNao family).
Building similarity graph...
Analyzing shared references across papers
Loading...
Yunji Chen
Chinese Academy of Sciences
Tianshi Chen
Shenzhen Research Institute of Big Data
Zhiwei Xu
Ningbo University
Communications of the ACM
Inria Saclay - Île de France
International Centre for Theoretical Physics Asia-Pacific
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Fri,) studied this question.
synapsesocial.com/papers/6a0a564ec72bf9c3ae116b0f — DOI: https://doi.org/10.1145/2996864