What question did this study set out to answer?

The aim is to optimize task scheduling in cloud manufacturing by addressing issues of sparse rewards and exploration in high-dimensional spaces.

March 25, 2026

Deep Reinforcement Learning for Cloud Manufacturing Task Scheduling via Sparse‐Reward Optimization

Key Points

The aim is to optimize task scheduling in cloud manufacturing by addressing issues of sparse rewards and exploration in high-dimensional spaces.
Developed a multi-task scheduling model as a Markov Decision Process (MDP).
Implemented a DRL scheduling strategy using double-delayed deep Q-network architecture.
Incorporated prioritized experience replay and two-dimensional action spaces to enhance decision stability.
Utilized potential-based reward shaping and curiosity-driven exploration to improve learning efficiency.
SPIRIT-PD3QN outperforms mainstream scheduling algorithms in overall task performance.
Achieved competitive results across multiple evaluation metrics compared to existing DRL methods.

Abstract

ABSTRACT Task scheduling in cloud manufacturing (CMfg) systems faces significant challenges due to the need to coordinate distributed and heterogeneous resources. While CMfg enables virtualization and service‐oriented collaboration, task competition and dependencies further complicate efficient, real‐time resource allocation. Deep reinforcement learning (DRL) has emerged as a promising solution for CMfg task scheduling; however, existing DRL methods suffer from issues such as sparse rewards and inefficient exploration in high‐dimensional action spaces. To address these challenges, this paper proposes SPIRIT‐PD3QN, a novel DRL approach for hybrid task scheduling. We first construct a multi‐task scheduling model in a CMfg environment and formulate it as a Markov Decision Process (MDP). Building on this model and the MDP framework, we design a DRL scheduling strategy that employs a double‐delayed deep Q‐network architecture, combined with a prioritized experience replay mechanism and a two‐dimensional action space, to improve the stability and generality of scheduling decisions. Furthermore, potential‐based reward shaping and curiosity‐driven exploration are integrated to mitigate the sparse rewards problem and enhance learning efficiency. Numerical experiments demonstrate that our proposed method outperforms mainstream scheduling algorithms in optimizing overall task scheduling performance. Compared with mainstream DRL scheduling approaches, our method achieves competitive results across multiple evaluation metrics.

Bookmark

Deep Reinforcement Learning for Cloud Manufacturing Task Scheduling via Sparse‐Reward Optimization

Key Points

Abstract

Cite This Study