I want to speed up the code below - namely the for loop. Is there a way to do it in numpy?
import numpy as np
# define seend and random state
rng = np.random.default_rng(0)
# num of throws
N = 10**1
# max number of trials
total_n_trials = 10
# generate the throws' distributions of "performace" - overall scores
# mu_throws = rng.normal(0.7, 0.1, N)
mu_throws = np.linspace(0,1,N)
all_trials = np.zeros(N*total_n_trials)
for i in range(N):
# generate trials per throw as Bernoulli trials with given mean
all_trials[i*total_n_trials:(i 1)*total_n_trials] = rng.choice([0,1], size=total_n_trials, p=[1-mu_throws[i],mu_throws[i]])
More explanations - I want to generate N
sequences of Bernoulli trials (ie. 0s and 1s, called throws
) where each sequence has a mean (probability p
) given by values in another array (mu_throws
). This can be sampled from a normal distribution or in this case, I took it for simplicity to be a sequence of N=10
numbers from 0 to 1. The approach above works but is slow. I expect N
to be at least $10^4$ and total_n_trials
then can be anything in the order of hundreds to (tens of) thousands (and this is run several times). I have checked the following post but didn't find an answer. I also know that numpy random choice
can generate multidimensional arrays but I didn't find a way to set a different set of p
s for different rows. Basically getting the same as what I do above just reshaped:
all_trials.reshape(N,-1)
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[1., 0., 0., 1., 0., 0., 0., 1., 1., 0.],
[1., 0., 1., 0., 0., 1., 0., 1., 0., 1.],
[1., 0., 1., 0., 0., 0., 1., 1., 0., 0.],
[1., 0., 0., 1., 0., 1., 0., 1., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
I also thought about this trick but haven't figured how to use it for Bernoulli trials. Thanks for any tips.
CodePudding user response:
While np.random.Generator.binomial()
(or the legacy np.random.binomial()
) is probably the cleanest way of computing what you are after (as suggested in @MechanicPig's answer), it looks like it is not the fastest.
Actually, with the input sizes you indicated to be relevant for you, OP's approach can even be faster.
Instead, an approach based on thresholding a uniformly distributed random array (essentially a polished version of @SalvatoreDanieleBianco's answer) seems to be fastest.
Also, the newer random generators seem to be faster.
Eventually, the thresholding approach can be implemented in Numba, but this does not result in significant speed-up.
Here are some equivalent approaches:
import numpy as np
def bin_samples_OP(n, mu, seed=0):
m = len(mu)
rng = np.random.default_rng(seed)
result = np.zeros((m, n), dtype=np.uint8)
for i in range(m):
result[i, :] = rng.choice([0, 1], size=n, p=[1 - mu[i], mu[i]])
return result
import numpy as np
def bin_samples_binom(n, mu, seed=0):
np.random.seed(seed)
return np.random.binomial(1, mu[:, None], (len(mu), n))
import numpy as np
def bin_samples_binom2(n, mu, seed=0):
rng = np.random.default_rng(seed)
return rng.binomial(1, mu[:, None], (len(mu), n))
import numpy as np
def bin_samples_rand(n, mu, seed=0):
np.random.seed(seed)
return (np.random.random_sample((len(mu), n)) < mu[:, None]).astype(np.uint8)
import numpy as np
def bin_samples_rand2(n, mu, seed=0):
rng = np.random.default_rng(seed)
return (rng.random(size=(len(mu), n)) < mu[:, None]).astype(np.uint8)
import numpy as np
def bin_samples_rand3(n, mu, seed=0):
rng = np.random.default_rng(seed)
return (rng.random(size=(len(mu), n)).T < mu).astype(np.uint8).T
import numpy as np
import numba as nb
import random
@nb.njit
def bin_samples_nb(n, mu, seed=0):
m = len(mu)
random.seed(seed)
result = np.empty((m, n), dtype=np.uint8)
for i in range(m):
for j in range(n):
result[i, j] = random.random() < mu[i]
return result
Following are a couple of test to see how they are equivalent:
funcs = (
bin_samples_OP, bin_samples_binom, bin_samples_binom2,
bin_samples_rand, bin_samples_rand2, bin_samples_rand3,
bin_samples_nb)
n = 10
m = 5
mu = np.linspace(0, 1, m)
print(f"{'Target':>24} {mu}")
for i, func in enumerate(funcs, 1):
res = func(n, mu)
mu_exp = np.mean(res, -1)
is_good = np.isclose(mu, mu_exp, atol=0.1, rtol=0.1).astype(np.uint8)
print(f"{func.__name__:>24} {mu_exp} {is_good}")
Target [0. 0.25 0.5 0.75 1. ]
bin_samples_OP [0. 0.31 0.53 0.73 1. ] [1 1 1 1 1]
bin_samples_binom [0. 0.22 0.5 0.77 1. ] [1 1 1 1 1]
bin_samples_binom2 [0. 0.31 0.56 0.68 1. ] [1 1 1 1 1]
bin_samples_rand [0. 0.22 0.5 0.77 1. ] [1 1 1 1 1]
bin_samples_rand2 [0. 0.22 0.47 0.77 1. ] [1 1 1 1 1]
bin_samples_rand3 [0. 0.22 0.47 0.77 1. ] [1 1 1 1 1]
bin_samples_nb [0. 0.35 0.53 0.78 1. ] [1 1 1 1 1]
bin_samples_pnb [0. 0.22 0.5 0.78 1. ] [1 1 1 1 1]
(bin_samples_pnb()
is the same as bin_sampls_nb()
but with @nb.njit(parallel=True)
and range()
replaced by nb.prange()
)
and their respective timings for input sizes closer to what you indicated:
timings = {}
for p in range(12, 14):
for q in range(12, 15):
n = 2 ** p
m = 2 ** q
mu = np.linspace(0, 1, m)
print(f"n={n}, m={m}; (p={p}, q={q})")
timings[n, m] = []
for func in funcs:
res = func(n, mu)
mu_exp = np.mean(res, axis=-1)
is_good = np.allclose(mu, mu_exp, atol=0.1, rtol=0.1)
timed = %timeit -r 1 -n 1 -o -q func(n, mu)
timings[n, m].append(timed.best)
print(f"{func.__name__:>24} {is_good} {float(timed.best * 10 ** 3):>10.4f} ms")
n=4096, m=4096; (p=12, q=12)
bin_samples_OP True 533.5690 ms
bin_samples_binom True 517.4861 ms
bin_samples_binom2 True 487.3975 ms
bin_samples_rand True 202.7566 ms
bin_samples_rand2 True 135.5819 ms
bin_samples_rand3 True 139.0170 ms
bin_samples_nb True 110.7469 ms
bin_samples_pnb True 141.4636 ms
n=4096, m=8192; (p=12, q=13)
bin_samples_OP True 1031.1586 ms
bin_samples_binom True 1035.9100 ms
bin_samples_binom2 True 957.9257 ms
bin_samples_rand True 392.7915 ms
bin_samples_rand2 True 246.8801 ms
bin_samples_rand3 True 250.9648 ms
bin_samples_nb True 222.7287 ms
bin_samples_pnb True 270.5297 ms
n=4096, m=16384; (p=12, q=14)
bin_samples_OP True 2055.6572 ms
bin_samples_binom True 2036.8609 ms
bin_samples_binom2 True 1865.3614 ms
bin_samples_rand True 744.9770 ms
bin_samples_rand2 True 493.8508 ms
bin_samples_rand3 True 476.5701 ms
bin_samples_nb True 449.1700 ms
bin_samples_pnb True 523.8716 ms
n=8192, m=4096; (p=13, q=12)
bin_samples_OP True 886.8198 ms
bin_samples_binom True 1021.3093 ms
bin_samples_binom2 True 946.2922 ms
bin_samples_rand True 372.9902 ms
bin_samples_rand2 True 221.8423 ms
bin_samples_rand3 True 234.7383 ms
bin_samples_nb True 218.5655 ms
bin_samples_pnb True 276.3884 ms
n=8192, m=8192; (p=13, q=13)
bin_samples_OP True 1744.0382 ms
bin_samples_binom True 2101.7884 ms
bin_samples_binom2 True 1985.6555 ms
bin_samples_rand True 720.7352 ms
bin_samples_rand2 True 462.9820 ms
bin_samples_rand3 True 453.3408 ms
bin_samples_nb True 455.7062 ms
bin_samples_pnb True 541.0242 ms
n=8192, m=16384; (p=13, q=14)
bin_samples_OP True 3490.9539 ms
bin_samples_binom True 4106.5607 ms
bin_samples_binom2 True 3719.5692 ms
bin_samples_rand True 1363.6868 ms
bin_samples_rand2 True 884.4729 ms
bin_samples_rand3 True 868.3783 ms
bin_samples_nb True 888.8390 ms
bin_samples_pnb True 1030.0389 ms
These indicate bin_samples_rand2()
/ bin_samples_rand3()
(which perform essentially equivalently) and bin_samples_nb()
to be in the same ballpark.
Forcing parallelism in Numba is not going to help with speed (because LLVM can produce more optimized speed on its own rather than relying on Numba's parallelization model) and it will also make loose predictability.
CodePudding user response:
N = 11
mu_throws = np.linspace(0,1,N)
total_n_trials = 10_000
rng = np.random.default_rng(0)
all_trials = (rng.random((N, total_n_trials)).T<mu_throws).T.astype(int)
all_trials # shape (N, total_n_trials)
output:
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 0, 0],
...,
[1, 0, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 0, 1, 1],
[1, 1, 1, ..., 1, 1, 1]])
Basically, what I'm doing is generate random real numbers in the interval [0, 1)
and convert them in boolean results in function of a given probability (in mu_throws
).
if you compare the mu_throws
(actual probability mean) and the probability estimated in all_trials
you have:
np.c_[mu_throws, all_trials.mean(1)]
output:
array([[0. , 0. ],
[0.1 , 0.1003],
[0.2 , 0.1963],
[0.3 , 0.305 ],
[0.4 , 0.4006],
[0.5 , 0.5056],
[0.6 , 0.5992],
[0.7 , 0.6962],
[0.8 , 0.7906],
[0.9 , 0.8953],
[1. , 1. ]])
For N
and total_n_trials
values from your example the time required on my machine is 0.00014019012451171875
sec, vs 0.0012738704681396484
sec of your loop.
CodePudding user response:
>>> n = 10
>>> total_n_trials = 10
>>> mu_throws = np.linspace(0, 1, n)
>>> np.random.binomial(1, mu_throws[:, None], (n, total_n_trials))
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 0],
[1, 0, 0, 0, 1, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 1, 1, 0, 0, 1],
[0, 1, 1, 1, 1, 0, 1, 1, 1, 0],
[1, 0, 1, 1, 1, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 1, 0, 1, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
Numpy now recommends the new random generator. You can consider using numpy.random.Generator.binomial
:
>>> rng = np.random.default_rng()
>>> rng.binomial(1, mu_throws[:, None], (n, total_n_trials))
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 0, 1, 1, 0, 1, 1, 1, 0],
[0, 1, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 1, 0, 1, 0],
[1, 1, 1, 1, 1, 0, 0, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int64)