Binomial distributions (Bernoulli trials) with different probabilities-CodePudding

I want to speed up the code below - namely the for loop. Is there a way to do it in numpy?

import numpy as np
# define seend and random state
rng = np.random.default_rng(0)
# num of throws
N = 10**1
# max number of trials
total_n_trials = 10
# generate the throws' distributions of "performace" - overall scores
# mu_throws = rng.normal(0.7, 0.1, N)
mu_throws = np.linspace(0,1,N)

all_trials = np.zeros(N*total_n_trials)
for i in range(N):
    # generate trials per throw as Bernoulli trials with given mean
    all_trials[i*total_n_trials:(i 1)*total_n_trials] = rng.choice([0,1], size=total_n_trials, p=[1-mu_throws[i],mu_throws[i]])

More explanations - I want to generate N sequences of Bernoulli trials (ie. 0s and 1s, called throws) where each sequence has a mean (probability p) given by values in another array (mu_throws). This can be sampled from a normal distribution or in this case, I took it for simplicity to be a sequence of N=10 numbers from 0 to 1. The approach above works but is slow. I expect N to be at least $10^4$ and total_n_trials then can be anything in the order of hundreds to (tens of) thousands (and this is run several times). I have checked the following post but didn't find an answer. I also know that numpy random choice can generate multidimensional arrays but I didn't find a way to set a different set of ps for different rows. Basically getting the same as what I do above just reshaped:

all_trials.reshape(N,-1)

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
       [1., 0., 0., 1., 0., 0., 0., 1., 1., 0.],
       [1., 0., 1., 0., 0., 1., 0., 1., 0., 1.],
       [1., 0., 1., 0., 0., 0., 1., 1., 0., 0.],
       [1., 0., 0., 1., 0., 1., 0., 1., 1., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

I also thought about this trick but haven't figured how to use it for Bernoulli trials. Thanks for any tips.

CodePudding user response：

While np.random.Generator.binomial() (or the legacy np.random.binomial()) is probably the cleanest way of computing what you are after (as suggested in @MechanicPig's answer), it looks like it is not the fastest.

Actually, with the input sizes you indicated to be relevant for you, OP's approach can even be faster.

Instead, an approach based on thresholding a uniformly distributed random array (essentially a polished version of @SalvatoreDanieleBianco's answer) seems to be fastest.

Also, the newer random generators seem to be faster.

Eventually, the thresholding approach can be implemented in Numba, but this does not result in significant speed-up.

Here are some equivalent approaches:

import numpy as np


def bin_samples_OP(n, mu, seed=0):
    m = len(mu)
    rng = np.random.default_rng(seed)
    result = np.zeros((m, n), dtype=np.uint8)
    for i in range(m):
        result[i, :] = rng.choice([0, 1], size=n, p=[1 - mu[i], mu[i]])
    return result

import numpy as np


def bin_samples_binom(n, mu, seed=0):
    np.random.seed(seed)
    return np.random.binomial(1, mu[:, None], (len(mu), n))

import numpy as np


def bin_samples_binom2(n, mu, seed=0):
    rng = np.random.default_rng(seed)
    return rng.binomial(1, mu[:, None], (len(mu), n))

import numpy as np


def bin_samples_rand(n, mu, seed=0):
    np.random.seed(seed)
    return (np.random.random_sample((len(mu), n)) < mu[:, None]).astype(np.uint8)

import numpy as np


def bin_samples_rand2(n, mu, seed=0):
    rng = np.random.default_rng(seed)
    return (rng.random(size=(len(mu), n)) < mu[:, None]).astype(np.uint8)

import numpy as np


def bin_samples_rand3(n, mu, seed=0):
    rng = np.random.default_rng(seed)
    return (rng.random(size=(len(mu), n)).T < mu).astype(np.uint8).T

import numpy as np
import numba as nb
import random


@nb.njit
def bin_samples_nb(n, mu, seed=0):
    m = len(mu)
    random.seed(seed)
    result = np.empty((m, n), dtype=np.uint8)
    for i in range(m):
        for j in range(n):
            result[i, j] = random.random() < mu[i]
    return result

Following are a couple of test to see how they are equivalent:

funcs = (
    bin_samples_OP, bin_samples_binom, bin_samples_binom2,
    bin_samples_rand, bin_samples_rand2, bin_samples_rand3,
    bin_samples_nb)


n = 10
m = 5
mu = np.linspace(0, 1, m)

print(f"{'Target':>24}  {mu}")
for i, func in enumerate(funcs, 1):
    res = func(n, mu)
    mu_exp = np.mean(res, -1)
    is_good = np.isclose(mu, mu_exp, atol=0.1, rtol=0.1).astype(np.uint8)
    print(f"{func.__name__:>24}  {mu_exp}  {is_good}")

                  Target  [0.   0.25 0.5  0.75 1.  ]
          bin_samples_OP  [0.   0.31 0.53 0.73 1.  ]  [1 1 1 1 1]
       bin_samples_binom  [0.   0.22 0.5  0.77 1.  ]  [1 1 1 1 1]
      bin_samples_binom2  [0.   0.31 0.56 0.68 1.  ]  [1 1 1 1 1]
        bin_samples_rand  [0.   0.22 0.5  0.77 1.  ]  [1 1 1 1 1]
       bin_samples_rand2  [0.   0.22 0.47 0.77 1.  ]  [1 1 1 1 1]
       bin_samples_rand3  [0.   0.22 0.47 0.77 1.  ]  [1 1 1 1 1]
          bin_samples_nb  [0.   0.35 0.53 0.78 1.  ]  [1 1 1 1 1]
         bin_samples_pnb  [0.   0.22 0.5  0.78 1.  ]  [1 1 1 1 1]

(bin_samples_pnb() is the same as bin_sampls_nb() but with @nb.njit(parallel=True) and range() replaced by nb.prange())

and their respective timings for input sizes closer to what you indicated:

timings = {}
for p in range(12, 14):
    for q in range(12, 15):
        n = 2 ** p
        m = 2 ** q
        mu = np.linspace(0, 1, m)
        print(f"n={n}, m={m};  (p={p}, q={q})")
        timings[n, m] = []
        for func in funcs:
            res = func(n, mu)
            mu_exp = np.mean(res, axis=-1)
            
            is_good = np.allclose(mu, mu_exp, atol=0.1, rtol=0.1)
            timed = %timeit -r 1 -n 1 -o -q func(n, mu)
            timings[n, m].append(timed.best)
            print(f"{func.__name__:>24}  {is_good}  {float(timed.best * 10 ** 3):>10.4f} ms")

n=4096, m=4096;  (p=12, q=12)
          bin_samples_OP  True    533.5690 ms
       bin_samples_binom  True    517.4861 ms
      bin_samples_binom2  True    487.3975 ms
        bin_samples_rand  True    202.7566 ms
       bin_samples_rand2  True    135.5819 ms
       bin_samples_rand3  True    139.0170 ms
          bin_samples_nb  True    110.7469 ms
         bin_samples_pnb  True    141.4636 ms
n=4096, m=8192;  (p=12, q=13)
          bin_samples_OP  True   1031.1586 ms
       bin_samples_binom  True   1035.9100 ms
      bin_samples_binom2  True    957.9257 ms
        bin_samples_rand  True    392.7915 ms
       bin_samples_rand2  True    246.8801 ms
       bin_samples_rand3  True    250.9648 ms
          bin_samples_nb  True    222.7287 ms
         bin_samples_pnb  True    270.5297 ms
n=4096, m=16384;  (p=12, q=14)
          bin_samples_OP  True   2055.6572 ms
       bin_samples_binom  True   2036.8609 ms
      bin_samples_binom2  True   1865.3614 ms
        bin_samples_rand  True    744.9770 ms
       bin_samples_rand2  True    493.8508 ms
       bin_samples_rand3  True    476.5701 ms
          bin_samples_nb  True    449.1700 ms
         bin_samples_pnb  True    523.8716 ms
n=8192, m=4096;  (p=13, q=12)
          bin_samples_OP  True    886.8198 ms
       bin_samples_binom  True   1021.3093 ms
      bin_samples_binom2  True    946.2922 ms
        bin_samples_rand  True    372.9902 ms
       bin_samples_rand2  True    221.8423 ms
       bin_samples_rand3  True    234.7383 ms
          bin_samples_nb  True    218.5655 ms
         bin_samples_pnb  True    276.3884 ms
n=8192, m=8192;  (p=13, q=13)
          bin_samples_OP  True   1744.0382 ms
       bin_samples_binom  True   2101.7884 ms
      bin_samples_binom2  True   1985.6555 ms
        bin_samples_rand  True    720.7352 ms
       bin_samples_rand2  True    462.9820 ms
       bin_samples_rand3  True    453.3408 ms
          bin_samples_nb  True    455.7062 ms
         bin_samples_pnb  True    541.0242 ms
n=8192, m=16384;  (p=13, q=14)
          bin_samples_OP  True   3490.9539 ms
       bin_samples_binom  True   4106.5607 ms
      bin_samples_binom2  True   3719.5692 ms
        bin_samples_rand  True   1363.6868 ms
       bin_samples_rand2  True    884.4729 ms
       bin_samples_rand3  True    868.3783 ms
          bin_samples_nb  True    888.8390 ms
         bin_samples_pnb  True   1030.0389 ms

These indicate bin_samples_rand2() / bin_samples_rand3() (which perform essentially equivalently) and bin_samples_nb() to be in the same ballpark.

Forcing parallelism in Numba is not going to help with speed (because LLVM can produce more optimized speed on its own rather than relying on Numba's parallelization model) and it will also make loose predictability.

CodePudding user response：

N = 11
mu_throws = np.linspace(0,1,N)
total_n_trials = 10_000

rng = np.random.default_rng(0)
all_trials = (rng.random((N, total_n_trials)).T<mu_throws).T.astype(int)
all_trials # shape (N, total_n_trials)

output:

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ...,
       [1, 0, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 0, 1, 1],
       [1, 1, 1, ..., 1, 1, 1]])

Basically, what I'm doing is generate random real numbers in the interval [0, 1) and convert them in boolean results in function of a given probability (in mu_throws).

if you compare the mu_throws (actual probability mean) and the probability estimated in all_trials you have:

np.c_[mu_throws, all_trials.mean(1)]

output:

array([[0.    , 0.    ],
       [0.1   , 0.1003],
       [0.2   , 0.1963],
       [0.3   , 0.305 ],
       [0.4   , 0.4006],
       [0.5   , 0.5056],
       [0.6   , 0.5992],
       [0.7   , 0.6962],
       [0.8   , 0.7906],
       [0.9   , 0.8953],
       [1.    , 1.    ]])

For N and total_n_trials values from your example the time required on my machine is 0.00014019012451171875 sec, vs 0.0012738704681396484 sec of your loop.

CodePudding user response：

Use numpy.random.binomial:

>>> n = 10
>>> total_n_trials = 10
>>> mu_throws = np.linspace(0, 1, n)
>>> np.random.binomial(1, mu_throws[:, None], (n, total_n_trials))
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 0],
       [1, 0, 0, 0, 1, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 1, 1, 0, 0, 1],
       [0, 1, 1, 1, 1, 0, 1, 1, 1, 0],
       [1, 0, 1, 1, 1, 1, 1, 0, 1, 0],
       [1, 1, 1, 0, 1, 0, 1, 0, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

Numpy now recommends the new random generator. You can consider using numpy.random.Generator.binomial:

>>> rng = np.random.default_rng()
>>> rng.binomial(1, mu_throws[:, None], (n, total_n_trials))
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
       [0, 0, 0, 0, 1, 0, 1, 0, 1, 0],
       [1, 0, 0, 1, 1, 0, 1, 1, 1, 0],
       [0, 1, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1, 0, 1, 0],
       [1, 1, 1, 1, 1, 0, 0, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int64)