Abstract The high-frequency leak detection data generated by the SCADA system has led to a surge in data volume, making efficient storage and utilization of time series data difficult. Therefore, in order to optimize the integrated storage management effect of multi-source data in the oil and gas pipeline SCADA system, real-time data was studied the application of lake technology in the oil and gas pipeline SCADA system proposes a data management method for the oil and gas pipeline SCADA system based on real-time data lake technology. This method aims at the surge in data volume caused by high-frequency leak detection data in the SCADA system. Real-time data lake technology is used in the oil and gas pipeline SCADA system to directly store the leak detection time series data generated by the SCADA system in real time into the OpenTsDB time series database of the data lake. Realize rapid data storage and ensure the real-time and availability of data; and the data lake uses a data mining algorithm based on the improved K-means algorithm and a multi-source data fusion storage method based on the BIRCH algorithm to classify and fuse storage of multi-source data, in order to improve data utilization. After testing, the data lake technology has shown good stability in large data volumes and high concurrency scenarios after applying the SCADA system. The average time for pipeline leak detection and localization using SCADA system data is about 0.275 s, which is 0.125 s lower than expected and has a good leak detection efficiency.
Huang et al. (Thu,) studied this question.