Home > Mobile >  np.random.choice with a big probabilities array
np.random.choice with a big probabilities array

Time:01-16

I know that we can use a probability array for the choice function, but my question is how it works for big arrays. Let's assume that I want to have 1 thousand random numbers between 0-65535. How can we define the probability array to have p=0.4 for numbers less than 1000 and p=0.6 for the rest?

I tried to pass the range of numbers to the choice function, but apparently, it doesn't work like that.

CodePudding user response:

From the docs, each element of the argument p gives the probability for the corresponding element in a.

Since p and a need to have the same size, create a p of the same size as a:

a = np.arange(65536)
n_elem = len(a)

p = np.zeros_like(a, dtype=float)

Now, find all the elements of a less than 1000, and set p for those indices to 0.4 divided by the number of elements less than 1000. For this case, you can hardcode that calculation, since you know which elements of an arange are less than 1000:

p[:1000] = 0.4 / 1000
p[1000:] = 0.6 / 64536

For the general case where a is not derived from an arange, you could do:

lt1k = a < 1000
n_lt1k = lt1k.sum()

p[lt1k] = 0.4 / n_lt1k
p[~lt1k] = 0.6 / (n_elem - n_lt1k)

Note that p must sum to 1:

assert np.allclose(p.sum(), 1.0)

Now use a and p in choice:

selection = np.random.choice(a, size=(1000,), p=p)

To verify that the probability of selecting a value < 1000 is 40%, we can check how many are less than 1000:

print((selection < 1000).sum() / len(selection)) # should print a number close to 0.4
  • Related