What question did this study set out to answer?

The aim is to develop a framework that enhances performance in inference serving across edge-to-cloud systems.

March 21, 2026Open Access

SynergAI: Edge-to-Cloud Synergy for Architecture-Driven High-Performance Orchestration

FSFoteini StathopoulouNational Technical University of Athens AFAggelos FerikoglouNational Technical University of Athens MKManolis KatsaragakisNational Technical University of Athens

Puntos clave

The aim is to develop a framework that enhances performance in inference serving across edge-to-cloud systems.
Introduced SynergAI framework for architecture-aware scheduling
Integrated offline and online decision-making policies
Implemented within a Kubernetes-based ecosystem
Evaluated effectiveness on heterogeneous hardware platforms
Achieved an average reduction of 2.4 × in QoS violations compared to existing solutions
Enabled optimized deployments for emerging hardware
Demonstrated improved handling of computational demands for AI workloads

Resumen

The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads. While traditional cloud-based deployments offer scalability, they face challenges such as network congestion, high energy consumption, and privacy concerns. In contrast, edge computing provides low-latency and sustainable alternatives but is constrained by limited computational resources. In this work, we introduce SynergAI , a novel framework designed for performance- and architecture-aware inference serving across heterogeneous edge-to-cloud infrastructures. Built upon a comprehensive performance characterization of modern inference engines, SynergAI integrates a combination of offline and online decision-making policies to deliver intelligent, lightweight, and architecture-aware scheduling. By dynamically allocating workloads across diverse hardware architectures, it effectively minimizes Quality of Service (QoS) violations. We implement SynergAI within a Kubernetes-based ecosystem and evaluate its efficiency. Our results demonstrate that architecture-driven inference serving enables optimized and architecture-aware deployments on emerging hardware platforms, achieving an average reduction of 2.4 × in QoS violations compared to a State-of-the-Art (SotA) solution.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo