December 1, 2017

Cardinality estimation

Key Points

Key points are not available for this paper at this time.

Abstract

Data preparation and data profiling comprise many both basic and complex tasks to analyze a dataset at hand and extract metadata, such as data distributions, key candidates, and functional dependencies. Among the most important types of metadata is the number of distinct values in a column, also known as the zeroth-frequency moment. Cardinality estimation itself has been an active research topic in the past decades due to its many applications. The aim of this paper is to review the literature of cardinality estimation and to present a detailed experimental study of twelve algorithms, scaling far beyond the original experiments. First, we outline and classify approaches to solve the problem of cardinality estimation - we describe their main idea, error-guarantees, advantages, and disadvantages. Our experimental survey then compares the performance all twelve cardinality estimation algorithms. We evaluate the algorithms' accuracy, runtime, and memory consumption using synthetic and real-world datasets. Our results show that different algorithms excel in different in categories, and we highlight their trade-offs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hazar Harmouch

Amsterdam University of the Arts

Felix Naumann

Hasso Plattner Institute

Journals

Proceedings of the VLDB Endowment

Actions

Institutions

University of Potsdam

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Cardinality estimation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study