May 6, 2024

DAS: A DRL-Based Scheme for Workload Allocation and Worker Selection in Distributed Coded Machine Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Machine Learning (ML) has been widely applied to successfully address a variety of different problems across diverse domains, such as robotics, healthcare, and finance. However, high-complexity ML algorithms often require overlong computation time, which significantly impacts their feasibility. Distributed Machine Learning (DML) has been used to tackle the slow computation problem with high-complexity ML algorithms. Nevertheless, with DML, the computation results from all participating computing devices need to be collected in order to complete an ML task. When part of the participating devices, known as the stragglers, cannot return their results in time, the overall computation time will be extended. Distributed Coded Machine Learning (DCML) is a promising solution to mitigate the negative impact of the stragglers. With DCML, redundancy is injected into an ML task so that only a subset of the results from participating devices are required to finish the ML task. In DCML, how to select proper participating devices, referred to as workers, and how to allocate appropriate workloads to the selected workers are two challenging problems. In this paper, we consider a DCML scenario where numerous computing devices are available for an ML task. These devices are willing to offer their computation capacity in exchange for compensation. To encourage the computing devices to participate in the distributed computation, a reverse auction-based incentive mechanism is employed. With the objective of minimizing both the completion time of the ML task and the compensation for participating devices, we propose a Deep reinforcement learning based workload Allocation and worker Selection scheme for DCML, DAS. To our knowledge, this is the first attempt to simultaneously tackle both the workload allocation and worker selection issues in DCML. Our experimental results indicate that DAS outperforms the state-of-the-art schemes in terms of completion time and compensation.

Bookmark

DAS: A DRL-Based Scheme for Workload Allocation and Worker Selection in Distributed Coded Machine Learning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider