What question did this study set out to answer?

The aim is to integrate reinforcement learning into production planning to support sustainable practices in manufacturing.

March 25, 2026Open Access

Reinforcement Learning for Circular Manufacturing: A Proximal Policy Optimization Approach for Sustainable Production Planning

Key Points

The aim is to integrate reinforcement learning into production planning to support sustainable practices in manufacturing.
Developed a reinforcement learning environment for a manufacturing line producing pet-care products.
Utilized Proximal Policy Optimization to make real-time production decisions.
Designed a reward function that prioritizes waste reduction and material reuse.
The agent adapted quickly to changes in demand.
Achieved significant reductions in surplus materials.
Increased circularity scores progressively throughout the production episodes.

Abstract

This study develops reinforcement learning (RL) as a mechanism to align production planning with the principles of a circular economy. It focuses on a manufacturing line that produces pet-care products. The developed RL environment captures key resource constraints, material reuse, and waste flows. A Proximal Policy Optimization (PPO) agent learns to optimize real-time decisions, trading of production throughput against environmental impacts. Its reward function explicitly favors outcomes such as waste minimization and increased packaging reuse. Experimental findings indicate that the agent quickly adjusts to shifting demand, reduces surplus materials, and steadily raises circularity scores throughout episodes. Thus, the framework serves as a flexible, data-driven solution that industrial engineers can deploy when designing greener production workflows. In broader terms, the work illustrates that RL can be embedded in operative systems, advancing the shift to circular manufacturing.

Reinforcement Learning for Circular Manufacturing: A Proximal Policy Optimization Approach for Sustainable Production Planning

Key Points

Abstract

Cite This Study