Vision transformers (ViTs) have emerged as one of the most popular computer vision models, achieving remarkable performance in image recognition. However, ViTs require large-scale, high-dimensional matrix computations, and traditional digital accelerators, such as graphics processing units (GPUs), have memory bandwidth limitations, leading to higher latency, increased energy consumption, and larger area. To address this challenge, this paper proposes a memristor-based analog accelerator that leverages memristor crossbar arrays for in-memory computing, reducing data movement and improving computational efficiency. Considering the non-ideal characteristics of memristor devices and the influence of analog circuitry, we incorporate Gaussian-distributed analog computation error at each step and memristor non-ideality modeling into the ViT inference to enable realistic evaluation under hardware-level conditions. Experimental evaluation on ImageNet-1k dataset with TIMM-pretrained ViT models shows that the proposed analog accelerator can achieve the same Top-1 accuracy as a custom-designed 5 nm digital baseline accelerator, even with ~35% analog computation error and ~10% memristor conductance variation injected at each step. Compared to the digital counterpart, the proposed design achieves an 11.9× reduction in energy-delay product (EDP) and a 137.2× reduction in energy-delay-area product (EDAP).
Qu et al. (Sun,) studied this question.