Los puntos clave no están disponibles para este artículo en este momento.
With the rapid development and application of deep learning, its dataset size and network model are becoming increasingly large, and distributed model training is becoming increasingly popular. This article proposes a distributed heterogeneous task scheduling and resource allocation algorithm based on deep learning to address issues such as heterogeneity in resource usage, inability to predict task convergence time, communication time bottlenecks, and resource waste caused by static resource allocation during distributed collaborative training. This algorithm achieves dynamic scheduling and resource allocation of heterogeneous tasks and reduces task completion time in clusters. The experiment shows that the algorithm proposed in this article has significant improvements in both task completion time and system duration.
Qiu et al. (Thu,) studied this question.