What question did this study set out to answer?

This research aims to enhance the stability of reinforcement learning algorithms for steady-state chemical processes by minimizing action noise.

May 16, 2026Open Access

Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column

Key Points

This research aims to enhance the stability of reinforcement learning algorithms for steady-state chemical processes by minimizing action noise.
Implemented the Soft Actor-Critic (SAC) algorithm as the baseline for control.
Introduced three strategies: action-amplitude-constrained reward function, low-pass filter, and Kalman filter.
Conducted experiments on a thermal regeneration column to evaluate performance improvements.
Reduced fluctuation amplitudes of steam consumption by 85.50% (p<0.001).
Decreased cooling water consumption fluctuations by 82.81% (p<0.001).
Achieved a 90.84% reduction in sulfur concentration fluctuation amplitude (p<0.001).

Abstract

Stochastic policy reinforcement learning (RL) algorithms are widely used in industrial control due to their strong exploration ability and high sample efficiency. However, these algorithms often produce large action fluctuations and noise, making them unsuitable for steady-state chemical processes. To solve this problem, this study uses a thermal regeneration column (TRC) as the research object and selects the Soft Actor-Critic (SAC) algorithm as the baseline. Three strategies are introduced to improve the SAC algorithm: an action-amplitude-constrained reward function, a low-pass filter, and a Kalman filter. Experimental results show that the combination of the action-amplitude-constrained reward function and the Kalman filter achieves the best performance. Compared with the traditional SAC algorithm, the fluctuation amplitudes of steam consumption, cooling water consumption, sulfur concentration and methanol makeup rate are reduced by 85.50%, 82.81%, 90.84% and 85.49%, respectively. In addition, the fluctuation amplitude of the reward function decreases by 90.68%. This method not only optimizes operating costs but also ensures the stable operation of the TRC.

Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column

Key Points

Abstract

Cite This Study