Key points are not available for this paper at this time.
Data preparation and data profiling comprise many both basic and complex tasks to analyze a dataset at hand and extract metadata, such as data distributions, key candidates, and functional dependencies. Among the most important types of metadata is the number of distinct values in a column, also known as the zeroth-frequency moment. Cardinality estimation itself has been an active research topic in the past decades due to its many applications. The aim of this paper is to review the literature of cardinality estimation and to present a detailed experimental study of twelve algorithms, scaling far beyond the original experiments. First, we outline and classify approaches to solve the problem of cardinality estimation - we describe their main idea, error-guarantees, advantages, and disadvantages. Our experimental survey then compares the performance all twelve cardinality estimation algorithms. We evaluate the algorithms' accuracy, runtime, and memory consumption using synthetic and real-world datasets. Our results show that different algorithms excel in different in categories, and we highlight their trade-offs.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hazar Harmouch
Amsterdam University of the Arts
Felix Naumann
Hasso Plattner Institute
Proceedings of the VLDB Endowment
University of Potsdam
Building similarity graph...
Analyzing shared references across papers
Loading...
Harmouch et al. (Fri,) studied this question.
synapsesocial.com/papers/69d9b4945e5bcb4e3b837b5d — DOI: https://doi.org/10.1145/3186728.3164145