Key points are not available for this paper at this time.
Database research and development is heavily influenced by benchmarks, such as the industry-standard TPC-H and TPC-DS for analytical systems. However, these twenty-year-old benchmarks neither capture how databases are deployed nor what workloads modern cloud data warehouse systems face these days. In this paper, we summarize well-known, confirm suspected, and unearth novel discrepancies between TPC-H/DS and actual workloads using empirical data. We base our analysis on telemetrics from Amazon Redshift - one of the largest cloud data warehouse deployments. Among others, we show how write-heavy data pipelines are prominent, workloads vary over time (in both load and type), queries are repetitive, and how most properties of queries or workloads experience very long tailed distributions. We conclude that data warehouse benchmarks, just like database systems, need to become more holistic and stop focusing solely on query engine performance. Finally, we publish a dataset containing query statistics of 200 randomly selected Redshift serverless and provisioned instances (each) over a three-month period, as a basis for building more realistic benchmarks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alexander van Renen
Dominik Horn
Pascal Pfeil
Proceedings of the VLDB Endowment
Amazon (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Renen et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e61ca0b6db6435875aee44 — DOI: https://doi.org/10.14778/3681954.3682031