May 29, 2024

Akane: Perplexity-Guided Time Series Data Cleaning

Key Points

Key points are not available for this paper at this time.

Abstract

Dirty data are prevalent in time series, such as energy consumption or stock data. Existing data cleaning algorithms present shortcomings in dirty data identification and unsatisfactory cleaning decisions. To handle these drawbacks, we leverage inherent recurrent patterns in time series, analogize them as fixed combinations in textual data, and incorporate the concept of perplexity. The cleaning problem is thus transformed to minimize the perplexity of the time series under a given cleaning cost, and we design a four-phase algorithmic framework to tackle this problem. To ensure the framework's feasibility, we also conduct a brief analysis of the impact of dirty data and devise an automatic budget selection strategy. Moreover, to make it more generic, we additionally introduce advanced solutions, including an ameliorative probability calculation method grounded in the homomorphic pattern aggregation and a greedy-based heuristic algorithm for resource savings. Experiments on 12 real-world datasets demonstrate the superiority of our methods.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiaoyu Han

Haoran Xiong

Zhenying He

Journals

Proceedings of the ACM on Management of Data

Actions

Institutions

Tsinghua University

Fudan University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Akane: Perplexity-Guided Time Series Data Cleaning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study