Los puntos clave no están disponibles para este artículo en este momento.
Data analytics frameworks shift towards larger degrees of parallelism. Efficient scheduling of data-parallel jobs (tasks) is critical for improving job performance such as response time, and resource utilization. It is an important challenge for large scale data analytics frameworks in which jobs are more complex and have diverse characteristics (e.g., diverse resource requirements). Prior work on scheduling cannot achieve low response time and high resource utilization simultaneously because they cannot accurately estimate the durations of tasks in the queue of a worker machine by using sampling-based approach (including sampling with late binding) for task placement, and thus they fail to place tasks at the best possible worker machine. Also, they do not sufficiently consider the diverse resource requirements of jobs (tasks) for placing tasks on worker machines. To address this challenge, we propose a Dependency-aware and Resource-efficient Scheduling (DRS) to achieve low response time and high resource utilization. DRS takes into account task dependency and assigns tasks that are independent of each other to different worker machines. Also, DRS considers tasks' resource requirements and packs complementary tasks whose resource demands on multiple resources are complementary to each other to increase the resource utilization. In addition, DRS uses the mutual reinforcement learning to estimate the task's waiting time (the duration of tasks in the queue of a worker), and assigns tasks to workers with the consideration of tasks' waiting time to reduce the response time. Extensive experimental results based on a real cluster and experiments using real-world Amazon EC2 cloud service show that DRS achieves low response time and high resource utilization compared to previous strategies.
Liu et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: