Los puntos clave no están disponibles para este artículo en este momento.
The increasing complexity of modern manufacturing systems demands advanced decision-making approaches for production planning and control (PPC). Reinforcement learning (RL), as part of machine learning, has gained attention in recent years due to its ability to learn optimal policies for decision-making through trial-and-error interaction with a dynamic environment. This systematic literature review synthesizes 196 peer-reviewed publications from 2018 to 2024 on RL for PPC. Using an established RL framework, we analyze algorithm families, decision mechanisms, optimization objectives, evaluation practices, and industrial maturity. Results show a strong concentration on operational control, especially dispatching, with increasing adoption of policy-gradient methods and multi-agent formulations. Reward design remains dominated by time-based objectives such as makespan and tardiness, while cost, sustainability, and risk-oriented objectives are mainly treated as secondary terms. We identify a persistent structural gap between academic validation and industrial adoption. The majority of studies validate in synthetic simulations, only a small subset uses real industrial data, and very few connect trained policies to physical testbeds. No reviewed case study reports sustained closed-loop autonomous control in a live production system under continuous operation. We consolidate reported research gaps into an actionable agenda focused on environment fidelity, transfer governance, standardized evaluation, and safety and assurance mechanisms that enable scalable industrial deployment.
Mayerhoff et al. (Thu,) studied this question.