What type of study is this?

September 10, 2025Open Access

Accelerating Deep Learning Inference: A Comparative Analysis of Modern Acceleration Frameworks

Key Points

Certain frameworks demonstrate superior inference speed and throughput, enhancing performance in deep learning applications.
Key metrics evaluated include inference accuracy and memory usage, illustrating importance for embedded devices.
Frameworks were tested on the NVIDIA Jetson AGX Orin platform, providing insights on deployment complexity and runtime efficiency.
Differences in power consumption and hardware utilization were observed, highlighting trade-offs in resource-constrained environments.

Abstract

Deep learning (DL) continues to play a pivotal role in a wide range of intelligent systems, including autonomous machines, smart surveillance, industrial automation, and portable healthcare technologies. These applications often demand low-latency inference and efficient resource utilization, especially when deployed on embedded or edge devices with limited computational capacity. As DL models become increasingly complex, selecting the right inference framework is essential to meeting performance and deployment goals. In this work, we conduct a comprehensive comparison of five widely adopted inference frameworks: PyTorch, ONNX Runtime, TensorRT, Apache TVM, and JAX. All experiments are performed on the NVIDIA Jetson AGX Orin platform, a high-performance computing solution tailored for edge artificial intelligence workloads. The evaluation considers several key performance metrics, including inference accuracy, inference time, throughput, memory usage, and power consumption. Each framework is tested using a wide range of convolutional and transformer models and analyzed in terms of deployment complexity, runtime efficiency, and hardware utilization. Our results show that certain frameworks offer superior inference speed and throughput, while others provide advantages in flexibility, portability, or ease of integration. We also observe meaningful differences in how each framework manages system memory and power under various load conditions. This study offers practical insights into the trade-offs associated with deploying DL inference on resource-constrained hardware.

Accelerating Deep Learning Inference: A Comparative Analysis of Modern Acceleration Frameworks

Key Points

Abstract

Cite This Study

Also Consider

Also Consider