What question did this study set out to answer?

The aim is to develop hardware accelerators designed specifically for machine learning tasks, focusing on memory efficiency.

October 28, 2016

View Full Paper

DianNao family

YCYunji ChenChinese Academy of Sciences TCTianshi ChenShenzhen Research Institute of Big Data ZXZhiwei XuNingbo University

Key Points

The aim is to develop hardware accelerators designed specifically for machine learning tasks, focusing on memory efficiency.
Designed a series of hardware accelerators within the DianNao family for machine learning, especially neural networks.
Analyzed performance based on memory transfer efficiency versus traditional processors.
Tested speed and energy consumption of the 64-chip DaDianNao system compared to GPUs.
Achieved a speedup of 450.65x in processing efficiency over a GPU.
Reduced energy consumption by an average factor of 150.31x on the 64-chip DaDianNao system.

Abstract

Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers). As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously achieve high efficiency and broad application scope. While efficient computational primitives are important for a hardware accelerator, inefficient memory transfers can potentially void the throughput, energy, or cost advantages of accelerators, that is, an Amdahl's law effect, and thus, they should become a first-order concern, just like in processors, rather than an element factored in accelerator design on a second step. In this article, we introduce a series of hardware accelerators (i.e., the DianNao family) designed for ML (especially neural networks), with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that, on a number of representative neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip DaDianNao system (a member of the DianNao family).

KI fragen

Bookmark

View Full Paper

KI fragen

Bookmark

View Full Paper

DianNao family

Key Points

Abstract

Cite This Study

Also Consider

Also Consider