February 18, 2024

Acceleration of Neural Network Inference for Embedded GPU Systems

Key Points

Key points are not available for this paper at this time.

Abstract

In this study, we propose a general method for accelerating neural network inference on GPUs for embedded systems. Recently, the TensorRT is widely used for neural network inference on GPUs for embedded systems. However, as an efficient optimization method, a 8-bit quantization is not supported by TensorRT on a Nvidia Jetson Nano GPU. To address this, we proposed a acceleration method that involving quantizing weights and activations without TensorRT. Comparative experiments with TensorRT-optimized frame-works demonstrate that our method effectively accelerate the inference, while maintaing the inference accuracy.

Mark Helpful

Bookmark

Relay

Cite This Study

Terakura et al. (Sun,) studied this question.

synapsesocial.com/papers/68e78b93b6db6435876fd9b7 https://doi.org/https://doi.org/10.1109/bigcomp60711.2024.00069

Mark Helpful

Bookmark

Relay