What type of study is this?

September 5, 2025Open Access

Co-optimized Vision Transformer Deployment on Edge Devices: Algorithm-Hardware-Compiler 3D Evolution

Key Points

The proposed framework reduces PackQViT latency to 12.3 ms, enhancing edge deployment efficiency.
Achieving 62 img/s throughput with DynamicViT while improving accuracy over the original vision transformer.
Focused on algorithm compression and hardware-aware acceleration to tackle edge device limitations.
Challenges such as privacy, energy efficiency, and quantization stability are critically examined.

Abstract

Vision Transformer (ViT) with its attention mechanism in based on visual task performance, but its high computational complexity and memory requirements (such as ViT-base under the 224 x 224 input should be 17.6 GFLOPs, more than 2 GB of FP32 inference memory) limits its deployment on resource-constrained edge devices. In this paper, we propose a collaborative optimization framework that combines algorithm compression, hardware-aware acceleration, and compiler optimization, with a special focus on the possible breakthrough technologies in 2025 - MambaVision hybrid architecture and PH-Reg dynamic robustness enhancement. Through reliable optimization methods, the framework reduces PackQViT latency to 12.3 ms, achieves 62 img/s throughput of DynamicViT, and maintains or improves the accuracy over ViT-Base accuracy of 84.6% (e.g., PackQViT reaches 85.2%). In addition, challenges such as ultra-low-precision quantization generalization, dynamic architecture stability, cross-device collaboration, and the balance between privacy and energy efficiency are also explored.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Yifan Wu (Fri,) studied this question.

synapsesocial.com/papers/68bb46bd6d6d5674bccfebdf — DOI: https://doi.org/10.54097/b7d7w798

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers· 2024 · 8 citations
MambaVision: A Hybrid Mamba-Transformer Vision Backbone· 2025 · 182 citations
ENNA: An Efficient Neural Network Accelerator Design Based on ADC-Free Compute-In-Memory Subarrays· 2022 · 32 citations
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead· 2020 · 211 citations
LSQ+: Improving low-bit quantization through learnable offsets and better initialization· 2020 · 214 citations

Authors

Yifan Wu

Scuola Superiore Sant'Anna

Journals

Journal of Computing and Electronic Information Management

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Co-optimized Vision Transformer Deployment on Edge Devices: Algorithm-Hardware-Compiler 3D Evolution

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider