Los puntos clave no están disponibles para este artículo en este momento.
Attention mechanism acceleration is becoming increasingly vital to achieve superior performance in deep learning tasks. Existing accelerators are commonly devised dedicatedly by exploring the potential sparsity in neural network (NN) models, which suffer from complicated training, tuning processes, and accuracy degradation. By systematically analyzing the inherent dataflow characteristics of attention mechanism, we propose the Co-Operative Systolic Array (COSA) to pursue higher computational efficiency for its acceleration. In COSA, two systolic arrays that can be dynamically configured into weight or output stationary modes are cascaded to enable efficient attention operation. Thus, hybrid dataflows are simultaneously supported in COSA. Furthermore, various fusion methodologies and an advanced softmax unit are designed. Experimental results show that the COSA-based accelerator can achieve 2.95-28.82× speedup compared with the existing designs, with up to 97.4% PE utilization rate and less memory access.
Wang et al. (Sun,) studied this question.