During fluororubber production, strong material agitation and agglomeration induce severe dynamic fluctuations, irregular surface morphology, and pronounced variations in apparent material level. Under such operating conditions, conventional single-modality monitoring approaches—such as point-based height sensors or manual visual inspection—often fail to reliably capture the true process state. This information deficiency leads to inaccurate valve opening adjustment and degrades material level control performance. To address this issue, valve opening prediction is formulated as a data-driven, control-oriented regression task for material level regulation, and an end-to-end multimodal temporal regression framework, termed MECFN (Multi-Modal Enhanced Cross-Fusion Network), is proposed. The model performs deep fusion of visual image sequences and height sensor signals. A customized Multi-Feature Extraction (MFE) module is designed to enhance visual feature representation under complex surface conditions, while two independent Transformer encoders are employed to capture long-range temporal dependencies within each modality. Furthermore, a context-aware cross-attention mechanism is introduced to enable effective interaction and adaptive fusion between heterogeneous modalities. Experimental validation on a real-world industrial fluororubber production dataset demonstrates that MECFN consistently outperforms traditional machine learning approaches and single-modality deep learning models in valve opening prediction. Quantitative results show that MECFN achieves a mean absolute error of 2.36, a root mean squared error of 3.73, and an R2 of 0.92. These results indicate that the proposed framework provides a robust and practical data-driven solution for supporting valve control and achieving stable material level regulation in industrial production environments.
Yan et al. (Thu,) studied this question.