Multiple filtering 2D array by 2D array with mask by rows-CodePudding

I need to find vectorizing solution to filter 2D array by rows in another 2D array (with '0' mask)

Example with FOR loop:

import numpy as np
ma = np.array([ [0, 0,  2,  0,  0,  0,  3,  0],
                [0, 0,  3,  0,  0,  0,  0,  2],
                [0, 0,  2,  1,  0,  0,  0,  0],
                [1, 0,  0,  0,  0,  3,  0,  0]])

ds = np.array([[2, 3, 3, 2, 1, 1, 1, 2],
               [3, 3, 2, 2, 3, 2, 3, 3],
               [3, 3, 2, 2, 3, 3, 3, 2],
               [2, 1, 1, 3, 3, 3, 1, 2],
               [1, 3, 2, 1, 1, 2, 1, 1],
               [2, 3, 3, 2, 1, 1, 3, 2],
               [3, 1, 2, 3, 3, 2, 3, 3],
               [2, 1, 1, 2, 1, 2, 1, 1],
               [2, 3, 3, 1, 3, 2, 3, 3],
               [2, 1, 1, 3, 3, 3, 1, 2]])

result = np.zeros([ds.shape[0], ma.shape[0]], dtype = bool)

for i in range (ma.shape[0]):
    ds_filtered     = np.take(ds,  ma[i,:].nonzero(), axis = 1).squeeze()
    result [:,i]  = np.equal(ds_filtered, ma[i,ma[i,:] != 0] ).all(axis = 1)

result:

array([[False,  True, False, False],
       [ True, False, False, False],
       [ True, False, False, False],
       [False, False, False, False],
       [False, False,  True, False],
       [False,  True, False, False],
       [ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])

It trying to find rows in DS where row in MA is equal exept ZEROS. Zeros in 'ma' array means ANY number in 'ds' array (some kind of mask) And such find for all rows in MA

Task 1 (minimum): rewrite this code without FOR, i.e. make vectorised solution

Task 2 (optimal): make vectorisation in Task 1 adopted to CuPy.

Task 3 (maximum): rewrite it in CUDA kernel to use it as a custom kernel in CuPy or Numba

Task 4 (Superior))) Optimise Task 2 or Task 3 to use bitwise data (np.packbits & np.view) instead of boolean array (8 bit per element). It allows make arrays 8x smaller (real size of data more than 400gb)

Thanks in advance!

CodePudding user response：

Task 1 complete by myself))) here is vector by numpy

nonz = np.apply_along_axis(np.nonzero, 1, ma)
ds_f = np.take(ds,  nonz, axis = 1).squeeze()
ma_nonz = ma[ma!=0].reshape(ma.shape[0],-1)

result_vect = np.equal(ds_f, ma_nonz).all(axis=2)

Task 2 need to indirect use cupy.apply_along_axis(cupy..nonzero) through func (because CuPy incorrect apply cupy.nonzero in cupy.apply_along_axis)

def cupy_nonzero (a):
    return cp.nonzero(a)[0]

def filter_standart_loop (ma,ds):
    result = np.zeros([ds.shape[0], ma.shape[0]], dtype = bool)
    for i in range (ma.shape[0]):
        ds_filtered     = np.take(ds,  ma[i,:].nonzero(), axis = 1).squeeze()
        result [:,i]  = np.equal(ds_filtered, ma[i,ma[i,:] != 0] ).all(axis = 1)
    return result

def filter_vector_numpy (ma,ds):
    nonz = np.apply_along_axis(np.nonzero, 1, ma)
    ds_f = np.take(ds,  nonz, axis = 1).squeeze()
    ma_nonz = ma[ma!=0].reshape(ma.shape[0],-1)
    result_vect = np.equal(ds_f, ma_nonz).all(axis=2)
    return     result_vect

def filter_vector_cupy (ma,ds):
    ma = cp.array(ma)
    ds = cp.array(ds)
    nonz = cp.apply_along_axis(cupy_nonzero, 1, ma)
    ds_f = cp.take(ds,  nonz, axis = 1).squeeze()
    ma_nonz = ma[ma!=0].reshape(ma.shape[0],-1)
    result_vect = cp.equal(ds_f, ma_nonz).all(axis=2)
    return     cp.asnumpy(result_vect)

Speed measurement:

%timeit result1 = filter_standart_loop(ma,ds)
55.8 µs ± 951 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit result2 = filter_vector_numpy(ma,ds)
60.9 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit result3 = filter_vector_cupy(ma,ds)
1.66 ms ± 89.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Vectorisation in fact have almost the same time as standart loop. On BIG array it takes even more time than stadart loop.

CuPy is effective only on BIG array. For example:

21,69 sec - standart loop,

22.28 sec - vector_numpy,

6.17 sec - CuPy