What question did this study set out to answer?

This work aims to improve action-masked reinforcement learning for flexible job-shop scheduling by introducing a structured state representation.

May 26, 2026Open Access

Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling

Key Points

This work aims to improve action-masked reinforcement learning for flexible job-shop scheduling by introducing a structured state representation.
Developed a block-wise state representation for scheduling, capturing resource availability and operation attributes.
Compared a multi-branch feature extraction module against a baseline multilayer perceptron using identical PPO parameters.
Validated the approach using the Brandimarte MK benchmark suite and additional FJSP instances.
The proposed architecture achieved a lower best-achieved makespan on nine out of ten instances.
Improvements over the best baseline result reached up to 27.84%.
Performance advantage of the block-wise encoder was confirmed for larger FJSP cases with sub-second inference.

Abstract

This paper addresses the flexible job-shop scheduling problem (FJSP) as a constrained combinatorial optimization task with a large discrete action space. Although action-masked reinforcement learning has shown promise for such problems, the effect of structured vector-state encoding in scheduling has received less attention. The main contribution of this work is a structured block-wise state representation and a multi-branch feature extraction module for action-masked Proximal Policy Optimization (PPO). The proposed representation decomposes the scheduling state into three heterogeneous components capturing resource availability, operation readiness, and temporal attributes of operation–machine alternatives. Instead of flattening these signals into a single vector, the proposed encoder processes each block separately before aggregation, with the aim of preserving semantic structure during policy learning. To isolate the effect of representation design, we compare the proposed multi-branch encoder with a baseline single-branch multilayer perceptron under identical PPO hyperparameters and training conditions. Experiments on the Brandimarte MK benchmark suite show that the proposed architecture yields a lower best-achieved makespan on nine of ten instances and improves the best baseline result by up to 27.84%. Additional validation on selected Behnke and Geiger instances indicates that the BR encoder’s advantage extends to larger FJSP cases while preserving sub-second inference.

Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling

Key Points

Abstract

Cite This Study