What question did this study set out to answer?

This research aims to enhance the performance of Vision Transformers on edge devices by addressing computational and memory bottlenecks.

December 11, 2025

REATA: An Efficient Vision Transformer Accelerator Featuring a Resource-Optimized Attention Design on Versal ACAP

Key Points

This research aims to enhance the performance of Vision Transformers on edge devices by addressing computational and memory bottlenecks.
Proposed a modular and adaptive architecture for Vision Transformers targeting the AMD Versal ACAP platform.
Introduced a resource-efficient attention computation module localized within AI Engine core clusters.
Developed a resource-aware multi-stage pipeline scheduling strategy for feed-forward networks.
Achieved 33.2 TOPS throughput at INT8 precision, outperforming EQ-ViT accelerator by 27%.
Maintained competitive efficiency of 510.6 GOPS/W in testing.

Abstract

Deploying Vision Transformers (ViTs) on edge devices poses significant challenges due to their high computational demands and memory access overheads, which severely hinder real-time inference efficiency. This paper proposes a modular and adaptive ViT acceleration architecture targeting the AMD Versal ACAP platform. By leveraging heterogeneous resource collaboration and fine-grained dataflow optimizations, the proposed design addresses performance bottlenecks effectively. We introduce a resource-efficient attention computation module that localizes self-attention operations within AI Engine (AIE) core clusters, thereby reducing inter-module communication and minimizing MAC resource usage. In parallel, a resource-aware multi-stage pipeline scheduling strategy dynamically partitions and parallelizes the computation-intensive feed-forward network (FFN), improving computation reuse and module-level coordination. The architecture integrates parameter tiling and a PLIO-based broadcasting mechanism to construct a decoupled compute-communication dataflow engine, alleviating memory bottlenecks. Experimental results on the Xilinx VCK5000 ACAP platform demonstrate that the proposed design achieves 33.2 TOPS throughput at INT8 precision—outperforming the state-of-the-art EQ-ViT accelerator by 27%—while maintaining a competitive efficiency of 510.6 GOPS/W. Scalability evaluations on ViT-Base and DeiT-Tiny confirm the design’s adaptability in edge scenarios, offering a resource-efficient and reconfigurable hardware paradigm for high-density Transformer inference.

Bookmark

REATA: An Efficient Vision Transformer Accelerator Featuring a Resource-Optimized Attention Design on Versal ACAP

Key Points

Abstract

Cite This Study