Efficiently generating synthetic data from a numpy array-CodePudding

Let's say I have a set of probabilities in a numpy array:

[0.25, 0.12.5, 0.125, 0.50]

And I want to generate N one-hot encoded outcomes, proportional to each probability:

[0, 0, 0, 1] should be the outcome 50% of the time [0, 0, 1, 0] should be the outcome 12.5% of the time

and so on.

I could write something to do these one-by-one, but I need several million results so backending it with numpy would be ideal.

CodePudding user response：

IIUC, you can use numpy.random.choice with the array as probabilities to choose how to set up the 1 in an array of zeros:

a = np.array([0.25, 0.125, 0.125, 0.50])
out = np.zeros_like(a, dtype=int)
out[np.random.choice(range(len(a)), p=a)] = 1

print(out)

example: array([0, 0, 0, 1])

CodePudding user response：

Numpy's random.choice can apply probabilities.

import numpy as np
opts = [0.25,0.125,0.125,0.5]
outcomes = [[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]
z = np.random.choice(4,size=500,p=opts)
print(z)
z = np.take(outcomes,z,axis=0)
print(z)