July 9, 2023

COSA:Co-Operative Systolic Arrays for Multi-head Attention Mechanism in Neural Network using Hybrid Data Reuse and Fusion Methodologies

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Attention mechanism acceleration is becoming increasingly vital to achieve superior performance in deep learning tasks. Existing accelerators are commonly devised dedicatedly by exploring the potential sparsity in neural network (NN) models, which suffer from complicated training, tuning processes, and accuracy degradation. By systematically analyzing the inherent dataflow characteristics of attention mechanism, we propose the Co-Operative Systolic Array (COSA) to pursue higher computational efficiency for its acceleration. In COSA, two systolic arrays that can be dynamically configured into weight or output stationary modes are cascaded to enable efficient attention operation. Thus, hybrid dataflows are simultaneously supported in COSA. Furthermore, various fusion methodologies and an advanced softmax unit are designed. Experimental results show that the COSA-based accelerator can achieve 2.95-28.82× speedup compared with the existing designs, with up to 97.4% PE utilization rate and less memory access.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Wang et al. (Sun,) studied this question.

synapsesocial.com/papers/6a0ec5f206ecbe833447ca55 https://doi.org/https://doi.org/10.1109/dac56929.2023.10247678

Preguntar a la IA

Me gusta

Guardar