Key points are not available for this paper at this time.
Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.In this paper, we consider the problem of approximately answering general aggregate SQL queries over continuous data streams with limited memory. Our method relies on randomizing techniques that compute small "sketch" summaries of the streams that can then be used to provide approximate answers to aggregate queries with provable guarantees on the approximation error. We also demonstrate how existing statistical information on the base data (e.g., histograms) can be used in the proposed framework to improve the quality of the approximation provided by our algorithms. The key idea is to intelligently partition the domain of the underlying attribute(s) and, thus, decompose the sketching problem in a way that provably tightens our guarantees. Results of our experimental study with real-life as well as synthetic data streams indicate that sketches provide significantly more accurate answers compared to histograms for aggregate queries. This is especially true when our domain partitioning methods are employed to further boast the accuracy of the final estimates.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alin Dobra
University of Florida
Minos Garofalakis
Athena Research and Innovation Center In Information Communication & Knowledge Technologies
Johannes Gehrke
Microsoft (United States)
Cornell University
Alcatel Lucent (Germany)
Building similarity graph...
Analyzing shared references across papers
Loading...
Dobra et al. (Mon,) studied this question.
synapsesocial.com/papers/6a1bceb94ebd09f3dfa90c81 — DOI: https://doi.org/10.1145/564691.564699