What question did this study set out to answer?

This research aims to improve the efficiency of deep learning inference on embedded systems through an operation-level scheduling framework.

February 2, 2026Open Access

Operation‐level scheduling framework for efficient deep learning inference on embedded systems using directed acyclic graphs

Puntos clave

This research aims to improve the efficiency of deep learning inference on embedded systems through an operation-level scheduling framework.
Developed a three-stage scheduling framework for deep learning inference.
Profiled operation latencies across different hardware and input sizes offline.
Trained latency prediction models using features sensitive to input variances.
Utilized directed acyclic graphs for scheduling operations on CPUs and GPUs.
Achieved inference latency reduction of up to 74% across various deep learning models.
Demonstrated adaptability and effectiveness on Jetson Nano and ODROID-XU4 platforms.

Resumen

Abstract This study presents an operation‐level scheduling framework for efficient deep learning inference on heterogeneous embedded systems. Motivated by the observation that deep neural networks comprise diverse operations in which the execution latency is highly dependent on the target hardware and input dimensions. The framework hypothesizes that accurate latency prediction and fine‐grained scheduling of individual operations reduce end‐to‐end inference time. It follows a three‐stage approach: (i) offline profiling of operation latencies across varying input sizes and devices; (ii) training latency prediction models using input‐aware features; and (iii) directed acyclic graph‐based runtime scheduling to assign each operation to a central processing unit, graphics processing unit, or both. The framework is evaluated on two embedded platforms (Jetson Nano and ODROID‐XU4) and demonstrates an inference latency reduction of up to 74% across multiple deep learning models. These results indicate that the framework is adaptable, lightweight, and effective for resource‐constrained artificial intelligence deployments.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Kim et al. (Thu,) studied this question.

synapsesocial.com/papers/6980fe27c1c9540dea80fea1 https://doi.org/https://doi.org/10.4218/etrij.2025-0201

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo