February 1, 2021

In-Memory Computing based Accelerator for Transformer Networks for Long Sequences

Key Points

Key points are not available for this paper at this time.

Abstract

Transformer networks have outperformed recurrent neural networks and convolutional neural networks in various sequential tasks. However, scaling transformer networks for long sequences has been challenging because of memory and compute bottlenecks. Transformer networks are impeded by memory bandwidth limitations because of their low operation per byte ratio resulting in low utilization of GPU's computing resources. In-memory processing can mitigate memory bottlenecks by eliminating the transfer time between memory and compute units. Furthermore, transformer networks use neural attention mechanisms to characterize the relationships between sequence elements. Efficient hardware solutions have been proposed to implement efficient attention mechanisms, which include ternary content addressable memories (TCAM), crossbar arrays (XBars), and processing in-memory (PIM). However, these solutions do not implement a multi-head self-attention mechanism. We propose using a combination of XBars and CAMs to accelerate transformer networks. We improve the speed of transformer networks by (1) computing in-memory, thus minimizing the memory transfer overhead, (2) caching reusable parameters to reduce the number of operations, (3) exploiting the available parallelism in the attention mechanism, and (4) using locality sensitive hashing to filter the number of sequence elements by their importance. Our approach achieves a 200x speedup and 41x energy improvement for a sequence length of 4098.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ann Franchesca Laguna

De La Salle University

Arman Kazemi

University of Notre Dame

Michael Niemier

University of Notre Dame

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

In-Memory Computing based Accelerator for Transformer Networks for Long Sequences

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study