What question did this study set out to answer?

The aim is to develop a low-precision softmax approximation to enable efficient hardware realization of vision transformers on resource-constrained FPGAs.

April 25, 2026Open Access

A Generalizable Low-Precision Softmax Approximation for Small-FPGA Deployment of Vision Transformers

Puntos clave

The aim is to develop a low-precision softmax approximation to enable efficient hardware realization of vision transformers on resource-constrained FPGAs.
Proposed a truncated Maclaurin-series exponential with input-range clamping for softmax computation.
Validated the method using SwinIR super resolution to assess reconstruction fidelity.
Analyzed for deployment on FPGA with a Horner-form reformulation for improved efficiency.
Achieved near-reference performance with only 0.15 dB PSNR and 0.0059 SSIM degradation for SwinIR-M.
Observed PSNR increase with less than 0.01 SSIM loss for larger SWINIR model.
Demonstrated significant reduction in precision and memory requirements without sacrificing quality.

Resumen

Softmax is a step in transformer computation during which the internal buffer size grows rapidly because of the use of the exponential function. Softmax is a fundamental yet computationally expensive operation in vision transformer attention, posing significant challenges for deployment on resource-constrained FPGAs (Field Programmable Gate Arrays). Computational precision demands grow at the softmax stage in the attention pipeline mainly because of the use of the exponential function in the softmax computation. This paper proposes a low-precision softmax approximation that combines a truncated Maclaurin-series exponential with input-range clamping to enable efficient hardware realization without sacrificing reconstruction quality. By bounding extreme attention scores that contribute negligibly to final outputs, the proposed method mitigates the instability of low-order polynomial approximations while preserving their hardware efficiency. The approach is first validated in software using SwinIR (Image restoration using the SWIN Transformer) super resolution to ensure reconstruction fidelity and is then analyzed for FPGA deployment. SWINIR is a multi-stage version of other transformers like Deit and Vit, making it a preferred option for testing the reconstruction fidelity of the change for transformers. Experimental results demonstrate that the proposed fourth-order clamped approximation achieves near-reference performance, incurring only 0.15 dB PSNR and 0.0059 SSIM degradation on SwinIR-M, while significantly reducing precision and memory requirements. For the large-sized SWINIR model (SWINIR-L), a PSNR increase with a less than 0.01 SSIM loss is observed, further highlighting the insignificance of extreme values as model size gets bigger. A Horner-form reformulation further improves hardware efficiency by limiting intermediate precision growth. Overall, this work presents a reconstruction-aware and hardware-friendly softmax reformulation that enables practical deployment of vision transformers on small FPGA platforms. This work also uses this contribution to improve the performance of the ViTA accelerator design. We also add bias initialization and a PE loop bound runtime variable to the existing ViTA accelerator design.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo