Home > front end >  Efficiently generating synthetic data from a numpy array
Efficiently generating synthetic data from a numpy array

Time:08-12

Let's say I have a set of probabilities in a numpy array:

[0.25, 0.12.5, 0.125, 0.50]

And I want to generate N one-hot encoded outcomes, proportional to each probability:

[0, 0, 0, 1] should be the outcome 50% of the time [0, 0, 1, 0] should be the outcome 12.5% of the time

and so on.

I could write something to do these one-by-one, but I need several million results so backending it with numpy would be ideal.

CodePudding user response:

IIUC, you can use numpy.random.choice with the array as probabilities to choose how to set up the 1 in an array of zeros:

a = np.array([0.25, 0.125, 0.125, 0.50])
out = np.zeros_like(a, dtype=int)
out[np.random.choice(range(len(a)), p=a)] = 1

print(out)

example: array([0, 0, 0, 1])

CodePudding user response:

Numpy's random.choice can apply probabilities.

import numpy as np
opts = [0.25,0.125,0.125,0.5]
outcomes = [[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]
z = np.random.choice(4,size=500,p=opts)
print(z)
z = np.take(outcomes,z,axis=0)
print(z)
  • Related