This study develops reinforcement learning (RL) as a mechanism to align production planning with the principles of a circular economy. It focuses on a manufacturing line that produces pet-care products. The developed RL environment captures key resource constraints, material reuse, and waste flows. A Proximal Policy Optimization (PPO) agent learns to optimize real-time decisions, trading of production throughput against environmental impacts. Its reward function explicitly favors outcomes such as waste minimization and increased packaging reuse. Experimental findings indicate that the agent quickly adjusts to shifting demand, reduces surplus materials, and steadily raises circularity scores throughout episodes. Thus, the framework serves as a flexible, data-driven solution that industrial engineers can deploy when designing greener production workflows. In broader terms, the work illustrates that RL can be embedded in operative systems, advancing the shift to circular manufacturing.
Alarcon et al. (Thu,) studied this question.