Key points are not available for this paper at this time.
The implementation for non-linear activation functions like GELU in Transformer-based model is an challenging problem, especially in pursuit of energy-efficiency and high accuracy. To address this challenge, this paper presents a hardware-friendly optimization method for GELU deployment, an activation-aware strategy with integer arithmetic-based GELU calculation. This method detects activation magnitudes, approximates small and extremely large activation values with linear functions, while a more exact polynomial approximation function is applied to other values to ensure model accuracy. Implemented and evaluated under 22nm CMOS technology, the proposed design can improve the energy-efficiency by 2.14x and area-efficiency by 1.41×, compared with contemporary designs, while the accuracy loss is less than 1 % in Vit, Deit and Swin models.
Zou et al. (Sun,) studied this question.