The introduction of the computing continuum paradigm introduces new challenges due to the heterogeneity of computing entities in deployments. These challenges primarily affect the ability to maintain appropriate Quality of Service (QoS) and Service-Level Agreement (SLA) values when assigning workloads and requests to nodes, particularly concerning response times, as well as application-specific metrics. Moreover, in scenarios where computing elements are inherently dynamic in terms of availability, computing power or latency, efficiently assigning workloads to the most appropriate computing element becomes an even more significant challenge. Current generic orchestrators, like Kubernetes, have shown themselves to be effective in homogeneous and static environments, where the usual QoS-unaware scheduling strategies focus mainly on load balancing, neglecting aspects such as reducing latency or constraining application-level metrics. In this study, we reveal that generic orchestrators like Kubernetes fall short when QoS-agnostic policies are applied to heterogeneous and dynamic edge-to-cloud environments. We introduce Qafhe, a novel framework that integrates effortlessly into Kubernetes. This framework is designed with a set of QoS-aware scheduling policies to effectively address the heterogeneity and dynamicity found in numerous edge-to-cloud setups. Our experiments, specifically involving the deployment of inference servers across diverse nodes, show up to 5✗ improvement in response times across various dynamic scenarios involving devices with heterogeneous compute capabilities, such as multi-core CPUs and diverse GPU types.
Cámara-Miró et al. (Thu,) studied this question.