What question did this study set out to answer?

The aim is to optimize pricing and inventory decisions using an autonomous multi-agent framework.

April 8, 2026

Agentic automation driven reinforcement learning for inventory optimization

Key Points

The aim is to optimize pricing and inventory decisions using an autonomous multi-agent framework.
Developed a Multi-Agent Reinforcement Learning framework for supply chain optimization.
Utilized Centralized-Training–Decentralized-Execution architecture for agent behavior.
Employed a combination of deep reinforcement learning and inventory theory for decision-making.
Agents trained using shared information, but execute policies independently for SKUs.
Agentic MARL system achieved $524k profit and 94.9% service level.
EOQ model yielded higher profit of $584k but lacked adaptability in dynamic environments.
Other RL methods showed high variability in performance, while traditional methods underperformed.

Abstract

Purpose The purpose of this study is to address the challenge of optimizing joint pricing and inventory decisions in supply chains by proposing an Agentic Automation-driven Multi-Agent Reinforcement Learning (MARL) framework. By embedding autonomy, goal-directed behavior and self-improving capabilities into each decision-making entity, this research overcomes limitations of traditional optimization methods in handling demand heterogeneity, variable lead times, dynamic pricing and shared resources such as warehouse capacity. Design/methodology/approach The methodology integrates principles of multi-agent automation within a Centralized-Training–Decentralized-Execution (CTDE) architecture, enabling agents to exhibit proactive, coordinated behavior. Agents are trained using shared global information (e. g. warehouse constraints and cross-product demand patterns) but execute specialized, independent policies for individual Stock Keeping Units (SKUs). The approach combines Deep Reinforcement Learning (RL) with inventory theory to jointly optimize pricing and replenishment decisions under stochastic demand, while enabling autonomous adaptation to changing market conditions. Findings The framework is benchmarked against eight popular optimization and learning approaches: Bayesian Optimization, Genetic Algorithm (GA), Evolutionary Algorithm, Deep Q-Networks (DQN), Newsvendor Model, Economic Order Quantity (EOQ), Proximal Policy Optimization (PPO) and Soft Q-Learning (SQL). The results of this study show that the agentic MARL system achieves strong, balanced performance (524k profit, 94. 9% service) with robust adaptability. The EOQ model offers higher profit (584k, 98. 9% service level) but only in stable environments because of its limited adaptiveness. Other RL methods (PPO and SQL) exhibit high variability, while traditional approaches (GA, rule-based, Bayesian) underperform, lacking the autonomy and learning capacity needed for dynamic business environments. Originality/value This study provides: a scalable architecture enabling autonomous, goal-driven coordination for supply chain optimization; empirical evidence showing the advantages of agentic RL over traditional methods in complex, uncertain settings; and foundational insights for extending agentic AI to real-world applications such as promotion planning, supplier collaboration and end-to-end retail automation. Overall, this work bridges academic research and operational practice, providing a pathway toward intelligent, adaptive and agentic supply chain systems.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sarit Maitra

Journals

Journal of Modelling in Management

Actions

Institutions

Alliance University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Agentic automation driven reinforcement learning for inventory optimization

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study