I have very long numpy array:
v = np.array([10, 15, 15, 15, 10, 30, 30, 10, 10])
And I want to insert 0s after each element with probability
stop_prob = 0.5
So result could look like:
[ 0 10 0 0 15 0 0 15 15 10 0 0 30 30 10 10 0 0]
Here is my code:
v_new = []
for j in range(len(v) 1):
choice = np.random.choice([1, 0], p=[1-stop_prob, stop_prob])
while choice == 0:
v_new.append(0)
choice = np.random.choice([1, 0], p=[1-stop_prob, stop_prob])
if j != len(v):
v_new.append(v[j])
It works but takes a lot of time for very big list (with millions of values). How can I vectorize this algorithm?
Here is my attempt to vectorize:
idx = np.random.choice([1, 0], size=len(v), p=[1-stop_prob, stop_prob])
v = np.insert(v, idx, 0)
But result is incorrect:
[ 0 0 0 0 0 0 0 0 10 0 15 15 15 10 30 30 10 10]
It puts all zeros in the beginning of the list
CodePudding user response:
If you prepend 0s
to each element of v
with probability p = stop_prob
until you insert the element, then this is a sequence of independent Bernoulli trials.
You can model the random variable "number of 0's before each element" as a Negative Binomial Distribution, to count the number of "failures" (0s), before getting exactly 1
"success", with success probability 1 - p
:
# number of zeros we will prepend to each element
# note: use len(v) 1 if we want trailing zeros, like the original algorithm
num_zeros = np.random.negative_binomial(1, 1 - stop_prob, len(v))
# indices where we will place the elements of v
idx = np.arange(0, len(num_zeros)) # original indices
idx = np.cumsum(num_zeros) # we make space for the zeros
# we build the final array
# note: use (np.max(idx),) if we want trailing zeros
v_new = np.zeros((np.max(idx) 1,), dtype = v.dtype)
v_new[idx[:len(v)]] = v
One run example:
>>> num_zeros
array([1, 0, 3, 0, 0, 0, 0, 1, 0])
>>> v_new
array([ 0, 10, 15, 0, 0, 0, 15, 15, 10, 30, 30, 0, 10, 10])