June 25, 2003

Approximating a data stream for querying and estimation: algorithms and performance evaluation

SGSuvajyoti GuhaUnited States Food and Drug Administration NKNick KoudasAT&T (United States)

Key Points

Key points are not available for this paper at this time.

Abstract

Obtaining fast and good-quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications, including approximate querying, similarity searching and data mining in most application domains, rely on such good-quality approximations. Histogram-based approximation is a very popular method in database theory and practice to succinctly represent a data distribution in a space-efficient manner. In this paper, we place the problem of histogram construction into perspective and we generalize it by raising the requirement of a finite data set and/or known data set size. We consider the case of an infinite data set in which data arrive continuously, forming an infinite data stream. In this context, we present single-pass algorithms that are capable of constructing histograms of provable good quality. We present algorithms for the fixed-window variant of the basic histogram construction problem, supporting incremental maintenance of the histograms. The proposed algorithms trade accuracy for speed and allow for a graceful tradeoff between the two, based on application requirements. In the case of approximate queries on infinite data streams, we present a detailed experimental evaluation comparing our algorithms with other applicable techniques using real data sets, demonstrating the superiority of our proposal.

AIに質問

Bookmark

View Full Paper

Cite This Study

Guha et al. (Wed,) studied this question.

synapsesocial.com/papers/6a130805b761793c20c100c1 https://doi.org/https://doi.org/10.1109/icde.2002.994775

AIに質問

Bookmark

View Full Paper