I have a boolean matrix in numpy with shape (m, n)
.
I initialize the matrix elements to be False
.
I want to randomly set exactly x
elements in each row (x < n
) with the value True
.
Now I go over the matrix with a loop, using np.random.choice
with no replacement:
mat = np.full((M, N), fill_value=False)
for i in range(mat.shape[0]):
mat[i, np.random.choice(mat.shape[1], x, replace=False)] = True
Is there a more efficient way to do this with numpy
?
CodePudding user response:
np.random.choice
is suboptimal when the number of value to pick is small compared to the size of the array. This is because the current implementation use a partitioning method. A faster implementation consist in picking some random positions, hen check is there are duplicates and repeat this process until all the positions are different (which is very likely when x/N
is very small (when x/N < 0.05
, the probability to generate correct numbers per iteration is >0.95). Numba can speed up this process. Here is the resulting code:
import numba as nb
@nb.njit('(int_, int_, int_[::1])')
def pick(x, N, out):
assert out.size == x
if x / N <= 0.05:
while True:
for j in range(x):
out[j] = np.random.randint(0, N)
out.sort()
ok = True
for i in range(x-1):
if out[i] == out[i 1]:
ok = False
if ok: return
out[:] = np.random.choice(N, x, replace=False)
@nb.njit('bool_[:,::1](int_, int_, int_)')
def compute(M, N, x):
mat = np.zeros((M, N), dtype=np.bool_)
cols = np.empty(x, np.int_)
for i in range(M):
pick(x, N, cols)
for j in cols:
mat[i, j] = True
return mat
N, M = 1000, 1000
x = 10
mat = compute(M, N, x)
An even faster and simpler approach is to set directly the values in the array as proposed by Kelly Bundy. This as the benefit of avoiding a slow sort operation. Here is the resulting code:
import numba as nb
@nb.njit('bool_[:,::1](int_, int_, int_)')
def compute(M, N, x):
mat = np.zeros((M, N), dtype=np.bool_)
cols = np.empty(x, np.int_)
for i in range(M):
if x/N <= 0.20:
k = 0
while k < x:
j = np.random.randint(0, N)
if not mat[i, j]:
mat[i, j] = True
k = 1
else:
for j in np.random.choice(N, x, replace=False):
mat[i, j] = True
return mat
N, M = 1000, 1000
x = 10
mat = compute(M, N, x)
This is 276 times faster than the initial approach on my machine and also much faster than the other answers.
Results
Initial: 27.61 ms
Salvatore D.B.: 20.54 ms
D.Manasreh: 14.90 ms
Numba V1: 0.66 ms
Numba V2: 0.10 ms <---
CodePudding user response:
Try something like this:
import numpy as np
m = 100
n = 100
x = 60
# generate a matrix of shape (m * n) with x true values per row
mat0 = np.tile(np.repeat([True, False], [x, (n-x)]), [m, 1])
# permute on rows
mat = np.random.default_rng().permuted(mat0, axis=1)
CodePudding user response:
#input
M, N, x = 100,9,3
mat = np.full((M, N), fill_value=False)
#solution
mat[np.repeat(np.arange(M), x), np.ravel([np.random.permutation(N)[:x] for i in range(M)])]=True
Output:
array([[False, True, True, False, False, False, False, False, True],
[ True, True, False, True, False, False, False, False, False],
[False, True, False, False, False, False, True, False, True],
[ True, True, False, True, False, False, False, False, False],
[False, False, False, True, False, False, True, False, True],
[False, True, False, True, False, True, False, False, False],
[False, False, False, True, False, True, True, False, False],
[ True, False, False, False, False, True, True, False, False],
[ True, False, False, True, True, False, False, False, False],
[ True, False, False, False, False, False, True, True, False],
[ True, True, False, False, False, True, False, False, False],
[ True, True, False, False, False, False, True, False, False],
[ True, False, True, False, False, False, False, True, False],
[False, False, False, False, False, True, True, True, False],
[False, False, True, True, False, True, False, False, False],
[False, False, False, True, False, True, True, False, False],
[False, True, False, True, False, False, False, False, True],
[False, False, False, False, True, True, False, True, False],
[False, False, False, False, False, True, True, False, True],
[ True, False, True, False, False, False, False, False, True],
[False, True, False, False, False, False, False, True, True],
[ True, False, False, False, False, True, False, True, False],
[False, False, True, False, False, False, False, True, True],
[ True, False, False, False, False, True, True, False, False],
[ True, True, False, False, False, False, False, True, False],
[False, False, False, True, False, False, True, False, True],
[False, False, True, True, False, False, False, True, False],
[False, False, False, False, False, True, False, True, True],
[False, False, False, False, True, True, False, False, True],
[False, False, True, False, False, True, False, True, False],
[False, True, False, False, True, True, False, False, False],
[False, False, True, True, False, False, True, False, False],
[False, False, False, True, False, True, False, True, False],
[ True, False, False, False, False, False, True, True, False],
[False, True, False, False, True, False, False, False, True],
[False, False, False, True, False, False, False, True, True],
[False, False, True, False, True, True, False, False, False],
[False, True, True, False, False, False, False, False, True],
[False, False, True, True, False, False, True, False, False],
[False, False, False, True, False, True, True, False, False],
[False, True, True, False, False, False, False, False, True],
[False, False, True, False, False, True, False, True, False],
[False, True, False, False, True, False, True, False, False],
[False, True, False, False, False, False, False, True, True],
[False, False, False, True, False, False, False, True, True],
[ True, False, False, True, True, False, False, False, False],
[False, False, False, True, False, True, False, False, True],
[False, False, True, False, True, False, True, False, False],
[ True, False, False, False, False, False, True, False, True],
[ True, True, False, False, False, True, False, False, False],
[False, False, True, True, False, False, True, False, False],
[False, False, True, True, True, False, False, False, False],
[False, False, True, False, True, False, False, False, True],
[False, False, True, False, False, True, True, False, False],
[ True, False, False, True, False, True, False, False, False],
[ True, False, False, False, True, False, False, True, False],
[False, True, True, False, False, False, True, False, False],
[False, False, False, False, False, False, True, True, True],
[ True, False, True, False, False, False, True, False, False],
[False, True, True, False, False, False, False, True, False],
[False, True, False, True, False, False, False, True, False],
[False, True, False, True, True, False, False, False, False],
[ True, False, False, False, False, False, True, True, False],
[ True, False, True, False, False, False, True, False, False],
[False, False, True, False, False, False, True, True, False],
[False, False, False, False, False, True, False, True, True],
[False, False, False, False, False, True, True, True, False],
[False, False, False, False, False, True, True, True, False],
[ True, True, False, False, True, False, False, False, False],
[ True, False, True, False, False, False, True, False, False],
[False, False, False, False, False, False, True, True, True],
[False, True, False, False, True, True, False, False, False],
[False, True, False, True, False, False, False, True, False],
[False, False, False, False, True, True, False, False, True],
[ True, True, False, False, False, False, True, False, False],
[False, True, False, False, True, False, False, True, False],
[False, False, True, True, False, True, False, False, False],
[False, True, False, True, False, True, False, False, False],
[False, False, False, True, False, True, False, False, True],
[ True, False, False, False, False, False, True, False, True],
[False, False, False, True, False, False, True, True, False],
[False, False, False, False, True, True, True, False, False],
[False, False, False, True, False, False, False, True, True],
[False, True, False, False, False, True, True, False, False],
[False, False, False, True, False, True, False, True, False],
[False, True, False, False, True, False, False, True, False],
[False, True, False, False, True, False, False, False, True],
[False, False, False, False, False, True, False, True, True],
[ True, False, False, False, True, False, False, True, False],
[ True, False, True, False, False, True, False, False, False],
[False, False, True, False, False, False, True, False, True],
[False, False, True, True, False, True, False, False, False],
[False, False, False, False, True, True, True, False, False],
[False, False, False, False, True, False, False, True, True],
[False, False, False, False, True, False, True, False, True],
[False, True, True, True, False, False, False, False, False],
[ True, False, False, False, False, True, False, False, True],
[ True, True, False, True, False, False, False, False, False],
[ True, False, False, False, True, False, False, False, True],
[ True, False, False, True, True, False, False, False, False]])
The time required in my machine is 0.002034902572631836
seconds, vs your solution that requires 0.0050237178802490234
#check the results
(mat.sum(1)==3).all() #True
mat.sum(0) #array([34, 30, 33, 46, 23, 35, 36, 31, 32])
CodePudding user response:
This solution is heavily inspired by Jérôme Richard's implementation of Kelly Bundy's approach, but with guaranteed x iterations per row. I don't know why it is slower than their x/N <= .2
branch.
import numba as nb # tested with numba 0.55.1
import numpy as np
@nb.njit('bool_[:,::1](int_, int_, int_)')
def compute1(M, N, x):
mat = np.zeros((M, N), dtype=np.bool_)
for i in range(M):
for j in range(N-x, N):
y = np.random.randint(j 1)
if mat[i, y]: y = j
mat[i, y] = True
return mat