Key points are not available for this paper at this time.
We introduce fast algorithms for selecting a random sample of n records without replacement from a pool of N records, where the value of N is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in O ( n (1 + log( N/n ))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jeffrey Scott Vitter (Fri,) studied this question.
www.synapsesocial.com/papers/69d739b25f9a1dad5348f747 — DOI: https://doi.org/10.1145/3147.3165
Jeffrey Scott Vitter
ACM Transactions on Mathematical Software
Brown University
Building similarity graph...
Analyzing shared references across papers
Loading...