August 17, 2025

A Study on Transformer Optimization for Image Processing on Edge Devices

Key Points

Dynamic structured pruning reduces computational complexity while maintaining accuracy, enabling efficient image processing.
The system achieves a 55% reduction in inference latency with less than 2% accuracy loss and improves energy efficiency by 3.1.
Co-designed algorithms with hardware platforms like FPGA and SoC enhance performance through custom optimization techniques.
Real-world evaluations highlight the method's adaptability and robustness across diverse operating conditions.

Abstract

Transformer models have achieved groundbreaking success in computer vision tasks, yet their deployment on resource-constrained edge devices remains challenging due to high computational complexity, memory demands, and hardware inefficiencies. This paper presents a holistic optimization framework to address these issues for real-time image processing in edge environments, particularly in autonomous driving systems. We propose a dynamic structured pruning method that adjusts model sparsity based on real-time scene complexity, combined with post-training quantization to compress model size while preserving accuracy. In addition, we co-design the algorithm with FPGA and SoC hardware platforms, leveraging custom sparse kernels, memory hierarchy optimization, and energy-efficient execution techniques. Evaluated on the KITTI and Cityscapes datasets, our method achieves a 55% reduction in inference latency with less than a 2% loss in accuracy, and improves energy efficiency by up to 3.1. Real-world tests confirm the robustness of the system under diverse operating conditions. This work offers a scalable and adaptable solution for deploying high-performance Transformer models in edge AI applications.

Perguntar à IA

Bookmark

Perguntar à IA

Bookmark

A Study on Transformer Optimization for Image Processing on Edge Devices

Key Points

Abstract

Cite This Study