Aiming at the quality control requirements of pharmaceutical preparation automatic production line, a quality prediction model integrating multi-source heterogeneous data of time series, images and texts is proposed. Aiming at the difference of sampling frequency between sensor, camera and log data, a linear interpolation spatio-temporal alignment algorithm is designed. Construct a three-layer hybrid architecture of "1D-CNN+Transformer+XGBoost", and use the attention mechanism to automatically learn the cross-modal weights to realize the deep fusion of features; KL divergence online learning module is embedded to monitor data distribution drift in real time and trigger incremental update. Based on validation using actual production line data from 150 batches of vitamin C effervescent tablets, the model achieved a root mean square error (RMSE) of 2.08N for tablet hardness prediction, with an average absolute percentage error (MAPE) of 3.72% and an accuracy rate of 98%. This represents a 21.9% reduction in error compared to the best baseline model and enables early warning of quality deviations. Ablation experiments show that attention fusion and multimodal information are key to performance improvement. This method provides a high-precision, interpretable and deployable data-driven quality advanced control solution for pharmaceutical industry.
Jia et al. (Sun,) studied this question.