Random choice out of 1D array with 2-dimensional probability in python-CodePudding

I would like to choose randomly out of a list with 3 elements (HGA, CGA, SGA), but I have 3 lists with the probabilities in it.

My probabilities are given by (the lists have the same length):

Probs = { 'HGA':prob['HGA'], 'CGA':prob['CGA'], 'SGA':prob['SGA'] }

with prob looking like this:

prob['HGA']=[0.5,0.2,0.4,0.6, ...]

and now I want to create another list which should look something like this without using a loop:

particles = ['HGA', 'CGA', 'CGA', 'CGA', 'SGA' ...]

The length of 'particles' should obviously have the same length as the probabilities.

CodePudding user response：

Assuming Probs indicates the probability to select each key (with the sum of values being 1) you can use numpy.random.choice:

Probs = {'HGA':0.1, 'CGA':0.2, 'SGA':0.7}

import numpy as np
particles = np.random.choice(list(Probs), p=list(Probs.values()), size=100)

output:

array(['SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'HGA', 'HGA',
       'HGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'CGA',
       'SGA', 'CGA', 'SGA', 'SGA', 'SGA', 'CGA', 'SGA', 'SGA', 'SGA',
       'HGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA',
       'HGA', 'SGA', 'SGA', 'CGA', 'CGA', 'SGA', 'SGA', 'SGA', 'SGA',
       'SGA', 'CGA', 'CGA', 'CGA', 'SGA', 'CGA', 'SGA', 'CGA', 'CGA',
       'CGA', 'SGA', 'CGA', 'CGA', 'SGA', 'SGA', 'HGA', 'SGA', 'HGA',
       'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'HGA', 'CGA', 'CGA', 'CGA',
       'CGA', 'SGA', 'SGA', 'HGA', 'SGA', 'SGA', 'CGA', 'SGA', 'HGA',
       'SGA', 'SGA', 'SGA', 'SGA', 'CGA', 'SGA', 'CGA', 'CGA', 'SGA',
       'HGA', 'SGA', 'HGA', 'SGA', 'CGA', 'SGA', 'SGA', 'CGA', 'SGA',
       'SGA'], dtype='<U3')

For a list, use:

particles = (np.random.choice(list(Probs), p=list(Probs.values()), size=100)
               .tolist()
             )

CodePudding user response：

If I understood correctly, the i-th element in the probability lists represents the probability of sampling the corresponding item at the i-th step. Meaning that summing the i-th items of all the lists should always give a total of 1. If yes, this should be what you are asking for. I made a toy example:

import numpy as np

Probs = {'HGA':[0.2, 0.6, 0.2], 'CGA':[0.7, 0.1, 0.3], 'SGA':[0.1, 0.3, 0.5]}
values = list(Probs.keys())

particles = [np.random.choice(values, p=sample_probs) for sample_probs in zip(*Probs.values())]

# ['CGA', 'HGA', 'HGA']
print(particles)

For a fast vectorized version, following this excellent answer:

def vectorized_choice(p, n, items=None):
    s = p.cumsum(axis=1)
    r = np.random.rand(p.shape[0], n, 1)
    q = np.expand_dims(s, 1) >= r
    k = q.argmax(axis=-1)
    if items is not None:
        k = np.asarray(items)[k]
    return k

p = np.column_stack(tuple(Probs.values()))
n = 1
items = list(Probs.keys())

sample = vectorized_choice(p, n, items)