What question did this study set out to answer?

The aim is to improve the execution efficiency of multiple deep neural networks in resource-limited environments by minimizing execution delays.

April 22, 2026Open Access

LAG-Guided Runtime Framework: Block-Level Scheduling and Dynamic Compression for Multi-DNN Environments

Key Points

The aim is to improve the execution efficiency of multiple deep neural networks in resource-limited environments by minimizing execution delays.
Proposed a block-level scheduling technique that divides DNN models into functional units.
Implemented a dynamic lightweight replacement method for optimizing execution delays at runtime.
Used LAG metric to balance execution delays and accuracy while running multiple models.
Achieved up to 29.3% improvement in latency when executing multiple DNNs concurrently.
Maintained 90% of baseline accuracy, ensuring performance reliability in processing.

Abstract

Recently, there has been a growing demand for real-time intelligent systems that can execute multiple deep neural network (DNN) models simultaneously for tasks such as object recognition, detection and tracking. However, running multiple DNNs simultaneously in resource-constrained embedded environments can lead to resource contention due to limited system resources. This can result in execution delays that cause critical issues in latency-sensitive processing. This paper proposes a dynamic scheduling technique that divides DNN models into functional units called blocks, which are then configured as execution units. Additionally, when running different models in parallel, it identifies blocks that actually increase execution time and controls them to run sequentially. Furthermore, to minimize execution delays while maintaining accuracy, we propose a dynamic lightweight replacement technique that replaces blocks with highly anticipated execution delays with lightweight blocks at runtime. This technique uses LAG , a metric which quantifies the degree of execution delay for each block, to dynamically adjust the balance between execution delays and accuracy. Experimental results show that when running multiple heterogeneous DNNs simultaneously on a commercial off-the-shelf board, the proposed technique improves latency by up to 29.3%, while maintaining 90% of baseline accuracy.

LAG-Guided Runtime Framework: Block-Level Scheduling and Dynamic Compression for Multi-DNN Environments

Key Points

Abstract

Cite This Study