What question did this study set out to answer?

The study introduces a method for sampling from join queries using Poisson sampling techniques.

April 12, 2026Open Access

Poisson Sampling over Acyclic Joins

Key Points

The study introduces a method for sampling from join queries using Poisson sampling techniques.
Proposed an algorithm for Poisson sampling over acyclic joins.
Constructed a random-access index for efficient tuple access.
Implemented the algorithm in column stores and studied engineering trade-offs.
Conducted experiments on real-world data comparing performance against existing methods.
The proposed algorithm runs in time O(N + k log N), outperforming traditional methods.
Demonstrated significant performance gains over the repeated-Bernoulli-trial algorithm.
Showed that the random-access index can effectively implement classical acyclic join processing.

Abstract

We introduce the problem of Poisson sampling over joins: compute a sample of the result of a join query by conceptually performing a Bernoulli trial for each join tuple, using a non-uniform and tuple-specific probability. We propose an algorithm for Poisson sampling over acyclic joins that is nearly instance-optimal, running in time O (N + k N) where N is the size of the input database, and k is the size of the resulting sample. Our algorithm hinges on two building blocks: (1) The construction of a random-access index that allows, given a number i, to randomly access the i-th join tuple without fully materializing the (possibly large) join result; (2) The probing of this index to construct the result sample. We study the engineering trade-offs required to make both components practical, focusing on their implementation in column stores, and identify best-performing alternatives for both. Our experiments on real-world data demonstrate that this pair of alternatives significantly outperforms the repeated-Bernoulli-trial algorithm for Poisson sampling while also demonstrating that the random-access index by itself can be used to competively implement Yannakakis' acyclic join processing algorithm when no sampling is required. This shows that, as far a query engine design is concerned, it is possible to adopt a uniform basis for both classical acyclic join processing and Poisson sampling, both without regret compared to classical join and sampling algorithms.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Liese BEKKERS

Frank Neven

Lorrens Pantelis

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Poisson Sampling over Acyclic Joins

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study