Home > database >  Sampling non-repeating integers with given probability distribution
Sampling non-repeating integers with given probability distribution

Time:11-23

I need to sample n different values taken from a set of integers.

These integers should have different occurence probability. E.g. the largest the lukilier.

By using the random package I can sample a set of different values from the set, by maeans of the method

random.sample

However it doesn't seem to provide the possibility to associate a probability distribution.

On the other hand there is the numpy package which allows to associate the distribution, but it returns a sample with repetitions. This can be done with the method

numpy.random.choice

I am looking for a method (or a way around) to do what the two methods do, but together.

CodePudding user response:

You can actually use numpy.random.choice as it has the replace parameter. If set to False, the sampling will be done wihtout remplacement.

Here's a random example:

>>> np.random.choice([1, 2, 4, 6, 9], 3, replace=False, p=[1/2, 1/8, 1/8, 1/8, 1/8])
>>> array([1, 9, 4])

CodePudding user response:

Let size be the size of the sample and p the probability distribution (a normalized list of probabilities).

First generate the size of the sample as follows

n = numpy.random.choice(size, 1, p)[0]

Then create the sample from the desired set of integers as follows:

random.sample(integers, n)
  • Related