October 1, 2018

A High-Speed and Low-Complexity Architecture for Softmax Function in Deep Learning

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Recently, significant improvement has been achieved for hardware architecture design of deep neural networks (DNNs). However, the hardware implementation of one widely used softmax function in DNNs has not been much investigated, which involves expensive division and exponentiation units. This paper performs an efficient hardware implementation of softmax function. Mathematical transformations and linear fitting are used to simplify this function. Multiple algorithmic strength reduction strategies and fast addition methods are employed to optimize the architecture. By using these techniques, complicated logic units like multipliers are eliminated and the memory consumption is largely reduced while the accuracy loss is negligible. The proposed design is coded using hardware description language (HDL) and synthesized under the TSMC 28-nm CMOS technology. Synthesis results show that the architecture achieves a throughput of 6.976 G/s for 8-bit input data. The power efficiency of 463.04 Gb/(mm 2 · mW) is achieved and it costs only 0.015mm 2 area resources. To the best of our knowledge, this is the first work on efficient hardware implementation for softmax in open literature.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Meiqi Wang

Sun Yat-sen University

Siyuan Lu

Nanjing University

Danyang Zhu

Northwestern University

Actions

Institutions

Nanjing University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A High-Speed and Low-Complexity Architecture for Softmax Function in Deep Learning

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study