What question did this study set out to answer?

The aim is to enhance production control optimization in process industries using an innovative offline reinforcement learning approach alongside conditional diffusion.

February 21, 2026Open Access

World model-driven process industry operations: An offline reinforcement learning solution based on conditional diffusion

Key Points

The aim is to enhance production control optimization in process industries using an innovative offline reinforcement learning approach alongside conditional diffusion.
Developed a world model-driven framework integrating conditional diffusion with offline reinforcement learning.
Utilized a spatiotemporal Transformer architecture to model dependencies along state-action trajectories.
Employed autoregressive generation for imagined trajectory support in RL agent training.
Adopted a Twin-Delayed Deep Deterministic policy gradient-based RL model, enhanced through behavior cloning.
Achieved a mean squared error of 1.27e−4 in virtual sample space construction.
Improved product qualification rate by approximately 12% compared to existing offline RL algorithms.
Validated the framework on a real tobacco leaf-processing line, realizing a 17.2% increase in process quality.

Abstract

Production control optimization in process industries is often challenged by complex physicochemical processes that are difficult to model mathematically. As a model-free approach, Reinforcement Learning (RL) offers a promising solution. However, online action exploration through trial-and-error risks compromising equipment safety and efficiency, while offline training suffers from sampling bias due to limited and imbalanced datasets, particularly the scarcity of faulty operation data. To address these issues, this study proposes a world model-driven operational framework that integrates conditional diffusion with offline RL. By leveraging the distribution approximation capability of diffusion models, we introduce a conditional trajectory generation mechanism constrained by operational parameters and historical state transitions. This allows the diffusion model to produce near-realistic state trajectories and reward signals, constructing an interactive virtual state–action–reward space. We further employ autoregressive generation of imagined trajectories to support RL agent training. During world model training, a spatiotemporal Transformer architecture is incorporated to capture dependencies along state–action trajectories. For offline agent training, a Twin-Delayed Deep Deterministic policy gradient-based RL model regularized by behavior cloning is adopted. Experiments on a tobacco leaf-processing line demonstrate that the proposed conditional diffusion-based offline RL method accurately constructs a virtual sample space with a mean squared error of 1.27e−4, significantly reducing policy acquisition costs. The resulting RL-driven parameter adjustment achieves an approximately 12% improvement in the product qualification rate compared to other state-of-the-art offline RL algorithms. Our algorithm implementation and evaluation dataset can be found here: https://github.com/sizizuo0076/WM-PIO-ORL . • A diffusion-based world model is proposed to guide offline decision agent training. • A spatiotemporal Transformer is used for noise prediction in the diffusion model. • Reinforcement learning with behavior cloning is designed for continuous control. • The proposed framework is validated on a real tobacco shredding production line. • The validation shows a 17.2% quality improvement for process production control.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yanlei Yin

Raymond Chiong

Chao Deng

Journals

Computers in Industry

Actions

Institutions

Nanyang Technological University

University of Newcastle Australia

Kunming University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

World model-driven process industry operations: An offline reinforcement learning solution based on conditional diffusion

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider