Key points are not available for this paper at this time.
Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hui et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e6dac2b6db643587657686 — DOI: https://doi.org/10.48550/arxiv.2404.16812
Xinning Hui
Yuanchao Xu
Zhishan Guo
Building similarity graph...
Analyzing shared references across papers
Loading...