The Flexible Job-shop Scheduling Problem (FJSP), a pivotal NP-hard challenge in intelligent manufacturing, has been increasingly addressed by Deep Reinforcement Learning (DRL) methods. However, existing approaches face a dilemma: Proximal Policy Optimization (PPO) ensures stability but suffers from conservative exploration, while Soft Actor–Critic (SAC) enhances exploration but lacks stability in discrete scheduling spaces. To resolve this trade-off, this study proposes PPO-Graph Explorer, a novel framework that integrates a Graph Isomorphism Attention Network (GIAN) with an Entropy-Adjusted PPO (EAE-PPO). Unlike generic Graph Transformers, our GIAN employs a structure-aware hybrid design specifically tailored for FJSP’s disjunctive graph topology. EAE-PPO introduces a structured exploration curriculum that enables the agent to mimic aggressive search behaviors early in training without sacrificing on-policy stability. Extensive experiments on standard benchmarks (Brandimarte, Hurink, Dauzère–Pérès) demonstrate our method’s superiority. Compared to state-of-the-art DRL baselines, it achieves an average makespan gap reduction of 5.1 percentage points with zero statistical outliers. Qualitative analysis further reveals an 8.95% reduction in makespan on representative instances, accompanied by a significant increase in average machine utilization from 89.0% to 98.1%.
Tan et al. (Mon,) studied this question.