Is there a way to vectorize this loop?-CodePudding

I'm trying to simulate the results of two different dice. One die is fair (i.e. the probability of each number is 1/6), but the other isn't.

I have a numpy array with 0's and 1's saying which die is used every time, 0 being the fair one and 1 the other. I'd like to compute another numpy array with the results. In order to do this task, I have used the following code:

def dice_simulator(dices : np.ndarray) -> np.ndarray:
  n = len(dices)
  results = np.zeros(n)
  i = 0
  for dice in np.nditer(dices):
    if dice:
      results[i] = rnd.choice(6, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4])   1
    else:
      results[i] = rnd.choice(6)   1
    i  = 1
  return results

This takes a lot of time compared to the rest of the program, and think it is because I'm iterating over a numpy array instead of using vectorization of operations. Can anyone help me with that?

CodePudding user response：

Answers already given vectorize by over generating and throwing up some outputs, it seems wrong.

Moreover, I will generalize to any number of dices.

First, you need to be able to get a condlist: it is a list of length the number of dices, with each i-th element being a boolean array containing True where the i-th dice should be used:

dices_idxs = np.array([0, 1, 2])
dices_sequence = np.array([0, 1, 2, 2, 1, 1, 0])

condlist = np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))

print(condlist)

# [[ True False False False False False  True]
#  [False  True False False  True  True False]
#  [False False  True  True False False False]]

Second, you can generalize the answer given by @Ahmed AEK using np.select:

def dice_simulator_select(dices_sequence, dices_weights):
    faces = np.arange(1, 7)
    num_dices = len(dices_weights)
    dices_idxs = np.arange(num_dices)
    num_throws = len(dices_sequence)

    condlist = list(
        np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
    )
    choicelist = [
        RNG.choice(faces, size=num_throws, p=dices_weights[dice_idx])
        for dice_idx in range(num_dices)
    ]
    return np.select(condlist, choicelist)

But it has the issue stated first as it over-generates then discards some generated values, which can be problematic considering randomness.

A more correct way is to use np.piecewise:

def dice_simulator_piecewise(dices_sequence, dices_weights):
    faces = np.arange(1, 7)
    num_dices = len(dices_weights)
    dices_idxs = np.arange(num_dices)
    num_dices = len(dices_weights)

    condlist = list(
        np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
    )
    # note size=len(x) ensure no more sample than needed are generated
    funclist = [
        lambda x: RNG.choice(faces, size=len(x), p=dices_weights[int(x[0])])
    ] * num_dices


    return np.piecewise(dices_sequence, condlist, funclist)

You can use the functions as follows, and see that the correct function using np.piecewise is even faster (20% faster in below case):

RNG = np.random.default_rng()

dices_weights = [
    None,  # uniform
    [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
    None,
    [1 / 4, 1 / 4, 1 / 4, 1 / 12, 1 / 12, 1 / 12],
    None,
    [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
]
num_dices = len(dices_weights)
num_throws = 1_000
dices_sequence = RNG.choice(np.arange(num_dices), size=num_throws)


%timeit dice_simulator_select(dices_sequence, dices_weights)
%timeit dice_simulator_piecewise(dices_sequence, dices_weights)

# 311 µs ± 5.94 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 240 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

CodePudding user response：

this is the correct way to do it.

def dice_simulator(dices: np.array) -> np.array:
    return np.where(
        dices,
        rnd.choice(6, dices.shape, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]),
        rnd.choice(6, dices.shape)
    )   1

CodePudding user response：

Try this:

def dice_simulator(dices):
    p = [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4]
    size = dices.shape
    fair_die = np.random.choice(6, size=size)
    unfair_die = np.random.choice(6, p=p, size=size)
    return (dices == 0) * fair_die   (dices == 1) * unfair_die   1