I need an algorithm to select K
elements from an array of length N
. Since N
can be a very large number, the algorithm should work with a steamed collection. To be more concrete, this is how the collection looks: [(el1, 1000), (el2, 500), ...., (elN, 230)]
. Elements with higher weights should have more chances to get into the resulting array.
I did some research and found a paper WEIGHTED RESERVOIR SAMPLING. It seems to be a good solution and it does what I need.
However, I also read about Fitness proportionate selection, which seems like it solves the problem too.
I struggle to understand the difference between them and which one should I choose?
CodePudding user response:
Neither one of those algorithms handles without-replacement sampling without modification.
If you want to simulate the manual process of drawing tickets until there are k distinct winners, then you can assign each element e a key −log1p(−random())/weight(e) where random() returns a uniform random double-precision floating point number between 0 inclusive and 1 exclusive, i.e., an exponential random variate with rate weight(e), and choose the k elements with the smallest keys in a streaming fashion.