What question did this study set out to answer?

The central aim is to develop a novel scheduling framework that combines PPO stability with improved exploration for job-shop scheduling.

March 12, 2026Open Access

PPO-Graph Explorer: A New Method for Flexible Job Shop Scheduling via Entropy-Guided Attention Networks

Key Points

The central aim is to develop a novel scheduling framework that combines PPO stability with improved exploration for job-shop scheduling.
Integrates Graph Isomorphism Attention Network (GIAN) with Entropy-Adjusted PPO (EAE-PPO)
Applies hybrid design tailored for disjunctive graph topology
Conducts extensive experiments on standard benchmarks including Brandimarte, Hurink, and Dauzère–Pérès
Reduces average makespan gap by 5.1 percentage points compared to state-of-the-art DRL methods
Achieves an 8.95% reduction in makespan on representative instances
Increases average machine utilization from 89.0% to 98.1%

Abstract

The Flexible Job-shop Scheduling Problem (FJSP), a pivotal NP-hard challenge in intelligent manufacturing, has been increasingly addressed by Deep Reinforcement Learning (DRL) methods. However, existing approaches face a dilemma: Proximal Policy Optimization (PPO) ensures stability but suffers from conservative exploration, while Soft Actor–Critic (SAC) enhances exploration but lacks stability in discrete scheduling spaces. To resolve this trade-off, this study proposes PPO-Graph Explorer, a novel framework that integrates a Graph Isomorphism Attention Network (GIAN) with an Entropy-Adjusted PPO (EAE-PPO). Unlike generic Graph Transformers, our GIAN employs a structure-aware hybrid design specifically tailored for FJSP’s disjunctive graph topology. EAE-PPO introduces a structured exploration curriculum that enables the agent to mimic aggressive search behaviors early in training without sacrificing on-policy stability. Extensive experiments on standard benchmarks (Brandimarte, Hurink, Dauzère–Pérès) demonstrate our method’s superiority. Compared to state-of-the-art DRL baselines, it achieves an average makespan gap reduction of 5.1 percentage points with zero statistical outliers. Qualitative analysis further reveals an 8.95% reduction in makespan on representative instances, accompanied by a significant increase in average machine utilization from 89.0% to 98.1%.

Bookmark

View Full Paper

Bookmark

View Full Paper

PPO-Graph Explorer: A New Method for Flexible Job Shop Scheduling via Entropy-Guided Attention Networks

Key Points

Abstract

Cite This Study