Home > Back-end >  Efficiently random set matrix elements in numpy
Efficiently random set matrix elements in numpy

Time:05-05

I have a boolean matrix in numpy with shape (m, n).

I initialize the matrix elements to be False.

I want to randomly set exactly x elements in each row (x < n) with the value True.

Now I go over the matrix with a loop, using np.random.choice with no replacement:

mat = np.full((M, N), fill_value=False)
for i in range(mat.shape[0]):
    mat[i, np.random.choice(mat.shape[1], x, replace=False)] = True

Is there a more efficient way to do this with numpy?

CodePudding user response:

np.random.choice is suboptimal when the number of value to pick is small compared to the size of the array. This is because the current implementation use a partitioning method. A faster implementation consist in picking some random positions, hen check is there are duplicates and repeat this process until all the positions are different (which is very likely when x/N is very small (when x/N < 0.05, the probability to generate correct numbers per iteration is >0.95). Numba can speed up this process. Here is the resulting code:

import numba as nb

@nb.njit('(int_, int_, int_[::1])')
def pick(x, N, out):
    assert out.size == x
    if x / N <= 0.05:
        while True:
            for j in range(x):
                out[j] = np.random.randint(0, N)
            out.sort()
            ok = True
            for i in range(x-1):
                if out[i] == out[i 1]:
                    ok = False
            if ok: return
    out[:] = np.random.choice(N, x, replace=False)

@nb.njit('bool_[:,::1](int_, int_, int_)')
def compute(M, N, x):
    mat = np.zeros((M, N), dtype=np.bool_)
    cols = np.empty(x, np.int_)
    for i in range(M):
        pick(x, N, cols)
        for j in cols:
            mat[i, j] = True
    return mat

N, M = 1000, 1000
x = 10
mat = compute(M, N, x)

An even faster and simpler approach is to set directly the values in the array as proposed by Kelly Bundy. This as the benefit of avoiding a slow sort operation. Here is the resulting code:

import numba as nb

@nb.njit('bool_[:,::1](int_, int_, int_)')
def compute(M, N, x):
    mat = np.zeros((M, N), dtype=np.bool_)
    cols = np.empty(x, np.int_)
    for i in range(M):
        if x/N <= 0.20:
            k = 0
            while k < x:
                j = np.random.randint(0, N)
                if not mat[i, j]:
                    mat[i, j] = True
                    k  = 1
        else:
            for j in np.random.choice(N, x, replace=False):
                mat[i, j] = True
    return mat

N, M = 1000, 1000
x = 10
mat = compute(M, N, x)

This is 276 times faster than the initial approach on my machine and also much faster than the other answers.

Results

Initial:         27.61 ms
Salvatore D.B.:  20.54 ms
D.Manasreh:      14.90 ms
Numba V1:         0.66 ms
Numba V2:         0.10 ms  <---

CodePudding user response:

Try something like this:

import numpy as np

m = 100
n = 100
x = 60

# generate a matrix of shape (m * n) with x true values per row
mat0 = np.tile(np.repeat([True, False], [x, (n-x)]), [m, 1])

# permute on rows
mat = np.random.default_rng().permuted(mat0, axis=1)

CodePudding user response:

#input
M, N, x = 100,9,3
mat = np.full((M, N), fill_value=False)

#solution
mat[np.repeat(np.arange(M), x), np.ravel([np.random.permutation(N)[:x] for i in range(M)])]=True

Output:

array([[False,  True,  True, False, False, False, False, False,  True],
   [ True,  True, False,  True, False, False, False, False, False],
   [False,  True, False, False, False, False,  True, False,  True],
   [ True,  True, False,  True, False, False, False, False, False],
   [False, False, False,  True, False, False,  True, False,  True],
   [False,  True, False,  True, False,  True, False, False, False],
   [False, False, False,  True, False,  True,  True, False, False],
   [ True, False, False, False, False,  True,  True, False, False],
   [ True, False, False,  True,  True, False, False, False, False],
   [ True, False, False, False, False, False,  True,  True, False],
   [ True,  True, False, False, False,  True, False, False, False],
   [ True,  True, False, False, False, False,  True, False, False],
   [ True, False,  True, False, False, False, False,  True, False],
   [False, False, False, False, False,  True,  True,  True, False],
   [False, False,  True,  True, False,  True, False, False, False],
   [False, False, False,  True, False,  True,  True, False, False],
   [False,  True, False,  True, False, False, False, False,  True],
   [False, False, False, False,  True,  True, False,  True, False],
   [False, False, False, False, False,  True,  True, False,  True],
   [ True, False,  True, False, False, False, False, False,  True],
   [False,  True, False, False, False, False, False,  True,  True],
   [ True, False, False, False, False,  True, False,  True, False],
   [False, False,  True, False, False, False, False,  True,  True],
   [ True, False, False, False, False,  True,  True, False, False],
   [ True,  True, False, False, False, False, False,  True, False],
   [False, False, False,  True, False, False,  True, False,  True],
   [False, False,  True,  True, False, False, False,  True, False],
   [False, False, False, False, False,  True, False,  True,  True],
   [False, False, False, False,  True,  True, False, False,  True],
   [False, False,  True, False, False,  True, False,  True, False],
   [False,  True, False, False,  True,  True, False, False, False],
   [False, False,  True,  True, False, False,  True, False, False],
   [False, False, False,  True, False,  True, False,  True, False],
   [ True, False, False, False, False, False,  True,  True, False],
   [False,  True, False, False,  True, False, False, False,  True],
   [False, False, False,  True, False, False, False,  True,  True],
   [False, False,  True, False,  True,  True, False, False, False],
   [False,  True,  True, False, False, False, False, False,  True],
   [False, False,  True,  True, False, False,  True, False, False],
   [False, False, False,  True, False,  True,  True, False, False],
   [False,  True,  True, False, False, False, False, False,  True],
   [False, False,  True, False, False,  True, False,  True, False],
   [False,  True, False, False,  True, False,  True, False, False],
   [False,  True, False, False, False, False, False,  True,  True],
   [False, False, False,  True, False, False, False,  True,  True],
   [ True, False, False,  True,  True, False, False, False, False],
   [False, False, False,  True, False,  True, False, False,  True],
   [False, False,  True, False,  True, False,  True, False, False],
   [ True, False, False, False, False, False,  True, False,  True],
   [ True,  True, False, False, False,  True, False, False, False],
   [False, False,  True,  True, False, False,  True, False, False],
   [False, False,  True,  True,  True, False, False, False, False],
   [False, False,  True, False,  True, False, False, False,  True],
   [False, False,  True, False, False,  True,  True, False, False],
   [ True, False, False,  True, False,  True, False, False, False],
   [ True, False, False, False,  True, False, False,  True, False],
   [False,  True,  True, False, False, False,  True, False, False],
   [False, False, False, False, False, False,  True,  True,  True],
   [ True, False,  True, False, False, False,  True, False, False],
   [False,  True,  True, False, False, False, False,  True, False],
   [False,  True, False,  True, False, False, False,  True, False],
   [False,  True, False,  True,  True, False, False, False, False],
   [ True, False, False, False, False, False,  True,  True, False],
   [ True, False,  True, False, False, False,  True, False, False],
   [False, False,  True, False, False, False,  True,  True, False],
   [False, False, False, False, False,  True, False,  True,  True],
   [False, False, False, False, False,  True,  True,  True, False],
   [False, False, False, False, False,  True,  True,  True, False],
   [ True,  True, False, False,  True, False, False, False, False],
   [ True, False,  True, False, False, False,  True, False, False],
   [False, False, False, False, False, False,  True,  True,  True],
   [False,  True, False, False,  True,  True, False, False, False],
   [False,  True, False,  True, False, False, False,  True, False],
   [False, False, False, False,  True,  True, False, False,  True],
   [ True,  True, False, False, False, False,  True, False, False],
   [False,  True, False, False,  True, False, False,  True, False],
   [False, False,  True,  True, False,  True, False, False, False],
   [False,  True, False,  True, False,  True, False, False, False],
   [False, False, False,  True, False,  True, False, False,  True],
   [ True, False, False, False, False, False,  True, False,  True],
   [False, False, False,  True, False, False,  True,  True, False],
   [False, False, False, False,  True,  True,  True, False, False],
   [False, False, False,  True, False, False, False,  True,  True],
   [False,  True, False, False, False,  True,  True, False, False],
   [False, False, False,  True, False,  True, False,  True, False],
   [False,  True, False, False,  True, False, False,  True, False],
   [False,  True, False, False,  True, False, False, False,  True],
   [False, False, False, False, False,  True, False,  True,  True],
   [ True, False, False, False,  True, False, False,  True, False],
   [ True, False,  True, False, False,  True, False, False, False],
   [False, False,  True, False, False, False,  True, False,  True],
   [False, False,  True,  True, False,  True, False, False, False],
   [False, False, False, False,  True,  True,  True, False, False],
   [False, False, False, False,  True, False, False,  True,  True],
   [False, False, False, False,  True, False,  True, False,  True],
   [False,  True,  True,  True, False, False, False, False, False],
   [ True, False, False, False, False,  True, False, False,  True],
   [ True,  True, False,  True, False, False, False, False, False],
   [ True, False, False, False,  True, False, False, False,  True],
   [ True, False, False,  True,  True, False, False, False, False]])

The time required in my machine is 0.002034902572631836 seconds, vs your solution that requires 0.0050237178802490234

#check the results
(mat.sum(1)==3).all() #True
mat.sum(0) #array([34, 30, 33, 46, 23, 35, 36, 31, 32])

CodePudding user response:

This solution is heavily inspired by Jérôme Richard's implementation of Kelly Bundy's approach, but with guaranteed x iterations per row. I don't know why it is slower than their x/N <= .2 branch.

import numba as nb # tested with numba 0.55.1
import numpy as np

@nb.njit('bool_[:,::1](int_, int_, int_)')
def compute1(M, N, x):
    mat = np.zeros((M, N), dtype=np.bool_)
    for i in range(M):
        for j in range(N-x, N):
            y = np.random.randint(j 1)
            if mat[i, y]: y = j
            mat[i, y] = True
    return mat
  • Related