Key points are not available for this paper at this time.
Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet data performed better than previous methods by orders of magnitude. Priority sampling is simple to define and implement: we consider a steam of items i=0,. . . , n-1 with weights wᵢ. For each item i, we generate a random number rᵢ in (0, 1) and create a priority qᵢ=wᵢ/rᵢ. The sample S consists of the k highest priority items. Let t be the (k+1) th highest priority. Each sampled item i in S gets a weight estimate Wᵢ=maxwᵢ, t, while non-sampled items get weight estimate Wᵢ=0. Magically, it turns out that the weight estimates are unbiased, that is, EWᵢ=wᵢ, and by linearity of expectation, we get unbiased estimators over any subset sum simply by adding the sampled weight estimates from the subset. Also, we can estimate the variance of the estimates, and surpricingly, there is no co-variance between different weight estimates Wᵢ and Wⱼ. We conjecture an extremely strong near-optimality; namely that for any weight sequence, there exists no specialized scheme for sampling k items with unbiased estimators that gets smaller total variance than priority sampling with k+1 items. Very recently Mario Szegedy has settled this conjecture.
Building similarity graph...
Analyzing shared references across papers
Loading...
Nick Duffield
Mitchell Institute
Carsten Lund
University of Hagen
Mikkel Thorup
University of Copenhagen
Building similarity graph...
Analyzing shared references across papers
Loading...
Duffield et al. (Fri,) studied this question.
synapsesocial.com/papers/6a212e2ba2a97f3a085ac7e6 — DOI: https://doi.org/10.48550/arxiv.cs/0509026
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: