What does this research mean for the field?

A deep reinforcement learning-based method reduces head deviation in hot-strip production by about 30%, improving flatness quality and operational stability. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to improve head deviation control during hot-strip production using deep reinforcement learning methods.

March 10, 2026

Research on Head Deviation Control Method of Finishing Strip Based on Deep Reinforcement Learning

Key Points

The aim is to improve head deviation control during hot-strip production using deep reinforcement learning methods.
Developed a framework for process parameters affecting head deviation
Employed a PSO-SVR model to predict deviation based on selected variables
Implemented a DQN under a Markov Decision Process to learn the control policy
Tested the method in an industrial hot-strip mill under real production conditions
Achieved a 30% average reduction in head deviation across multiple steel grades
Improved flatness quality of produced strips
Enhanced operational stability of the finishing rolling process

Abstract

During hot‐strip production, lateral head off‐tracking at the finisher exit degrades strip quality and destabilizes line operation, forming a key bottleneck to further improvements in flatness accuracy and unmanned operation. To address the issues of response latency, tuning complexity, and limited adaptability inherent in manual control, this study proposes a deep reinforcement learning (DRL)‐based method for head deviation control in finishing rolling. Targeting higher control accuracy and system stability, we first define a process‐parameter framework governing off‐tracking and select representative variables—strip width and thickness, interstand rolling‐force asymmetry, work‐roll‐bending force, flow‐stress coefficient, and historical deviation records. On this basis, a particle swarm optimization–support vector regression (PSO–SVR) model is employed to predict deviation, providing feedforward information to the controller. We then introduce a deep Q‐network (DQN), formulated under a Markov Decision Process (MDP), to learn the control policy; closed‐loop regulation is achieved by adjusting key actuator setpoints (e.g., preset gap differential). The method was deployed and tested on an industrial 2250 mm hot‐strip mill under real production conditions. Across multiple steel grades, it reduced head deviation by about 30% on average, improving flatness quality and operational stability. These results provide a practical basis and engineering reference for intelligent control of strip rolling and indicate strong prospects for broad industrial adoption.

Bookmark

Cite This Study

Wang et al. (Thu,) studied this question.

synapsesocial.com/papers/69af94da70916d39fea4bca3 https://doi.org/https://doi.org/10.1002/srin.202501194

Bookmark