Abstract The rapid growth of AI-driven applications in hybrid cloud–edge environments poses substantial challenges to ensuring low latency, high throughput, and effective resource utilization. Conventional deployment models, which are typically fixed or policy-driven, are not sufficiently flexible to respond dynamically to changing workloads and heterogeneous hardware environments. In this work, we introduce and analyze a resource-conscious deep learning-based scheduling system for managing the deployment of AI models on distributed cloud edges. The framework improves inference performance by leveraging real-time system telemetry and model features generated by benchmarks, while maintaining quality of service (QoS) compliance. The proposed system uses a fully connected neural network trained on structured features derived from the MLPerf Inference Benchmark, including compute complexity, memory footprint, and input dimensions. It is guided by real-time data from a hybrid infrastructure (NVIDIA A100/V100 GPUs and Jetson Xavier edge devices) to inform scheduling. Four MLPerf inference workloads – ResNet 50, BERT, SSD ResNet34, and DLRM – were tested and contrasted across various batch sizes and latency thresholds. Generalization experiments with unseen models such as GPT 2 and YOLOv5 yielded > 90% success rates in deployment, with the latency reduction and throughput gain results as presented above. Results of the generalization experiments with unseen models, including GPT 2 and YOLOv5, demonstrated deployment success rates > 90% for the various profiling conditions evaluated, with the latency reduction and throughput improvements as shown above. The results show that learning-based orchestration can be used to deliver space- and resource-aware orchestration solutions that are adaptive for low-latency deployment of AI services in hybrid cloud edge systems, but the effectiveness of the solution will depend on the representativeness of the profiling data and similarity of training and deployment environments.
Ahmed Albugmi (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: