Softmax is a step in transformer computation during which the internal buffer size grows rapidly because of the use of the exponential function. Softmax is a fundamental yet computationally expensive operation in vision transformer attention, posing significant challenges for deployment on resource-constrained FPGAs (Field Programmable Gate Arrays). Computational precision demands grow at the softmax stage in the attention pipeline mainly because of the use of the exponential function in the softmax computation. This paper proposes a low-precision softmax approximation that combines a truncated Maclaurin-series exponential with input-range clamping to enable efficient hardware realization without sacrificing reconstruction quality. By bounding extreme attention scores that contribute negligibly to final outputs, the proposed method mitigates the instability of low-order polynomial approximations while preserving their hardware efficiency. The approach is first validated in software using SwinIR (Image restoration using the SWIN Transformer) super resolution to ensure reconstruction fidelity and is then analyzed for FPGA deployment. SWINIR is a multi-stage version of other transformers like Deit and Vit, making it a preferred option for testing the reconstruction fidelity of the change for transformers. Experimental results demonstrate that the proposed fourth-order clamped approximation achieves near-reference performance, incurring only 0.15 dB PSNR and 0.0059 SSIM degradation on SwinIR-M, while significantly reducing precision and memory requirements. For the large-sized SWINIR model (SWINIR-L), a PSNR increase with a less than 0.01 SSIM loss is observed, further highlighting the insignificance of extreme values as model size gets bigger. A Horner-form reformulation further improves hardware efficiency by limiting intermediate precision growth. Overall, this work presents a reconstruction-aware and hardware-friendly softmax reformulation that enables practical deployment of vision transformers on small FPGA platforms. This work also uses this contribution to improve the performance of the ViTA accelerator design. We also add bias initialization and a PE loop bound runtime variable to the existing ViTA accelerator design.
Aboagye et al. (Wed,) studied this question.