I need to find vectorizing solution to filter 2D array by rows in another 2D array (with '0' mask)
Example with FOR loop:
import numpy as np
ma = np.array([ [0, 0, 2, 0, 0, 0, 3, 0],
[0, 0, 3, 0, 0, 0, 0, 2],
[0, 0, 2, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 3, 0, 0]])
ds = np.array([[2, 3, 3, 2, 1, 1, 1, 2],
[3, 3, 2, 2, 3, 2, 3, 3],
[3, 3, 2, 2, 3, 3, 3, 2],
[2, 1, 1, 3, 3, 3, 1, 2],
[1, 3, 2, 1, 1, 2, 1, 1],
[2, 3, 3, 2, 1, 1, 3, 2],
[3, 1, 2, 3, 3, 2, 3, 3],
[2, 1, 1, 2, 1, 2, 1, 1],
[2, 3, 3, 1, 3, 2, 3, 3],
[2, 1, 1, 3, 3, 3, 1, 2]])
result = np.zeros([ds.shape[0], ma.shape[0]], dtype = bool)
for i in range (ma.shape[0]):
ds_filtered = np.take(ds, ma[i,:].nonzero(), axis = 1).squeeze()
result [:,i] = np.equal(ds_filtered, ma[i,ma[i,:] != 0] ).all(axis = 1)
result:
array([[False, True, False, False],
[ True, False, False, False],
[ True, False, False, False],
[False, False, False, False],
[False, False, True, False],
[False, True, False, False],
[ True, False, False, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False]])
It trying to find rows in DS where row in MA is equal exept ZEROS. Zeros in 'ma' array means ANY number in 'ds' array (some kind of mask) And such find for all rows in MA
Task 1 (minimum): rewrite this code without FOR, i.e. make vectorised solution
Task 2 (optimal): make vectorisation in Task 1 adopted to CuPy.
Task 3 (maximum): rewrite it in CUDA kernel to use it as a custom kernel in CuPy or Numba
Task 4 (Superior))) Optimise Task 2 or Task 3 to use bitwise data (np.packbits & np.view) instead of boolean array (8 bit per element). It allows make arrays 8x smaller (real size of data more than 400gb)
Thanks in advance!
CodePudding user response:
Task 1 complete by myself))) here is vector by numpy
nonz = np.apply_along_axis(np.nonzero, 1, ma)
ds_f = np.take(ds, nonz, axis = 1).squeeze()
ma_nonz = ma[ma!=0].reshape(ma.shape[0],-1)
result_vect = np.equal(ds_f, ma_nonz).all(axis=2)
Task 2 need to indirect use cupy.apply_along_axis(cupy..nonzero) through func (because CuPy incorrect apply cupy.nonzero in cupy.apply_along_axis)
def cupy_nonzero (a):
return cp.nonzero(a)[0]
def filter_standart_loop (ma,ds):
result = np.zeros([ds.shape[0], ma.shape[0]], dtype = bool)
for i in range (ma.shape[0]):
ds_filtered = np.take(ds, ma[i,:].nonzero(), axis = 1).squeeze()
result [:,i] = np.equal(ds_filtered, ma[i,ma[i,:] != 0] ).all(axis = 1)
return result
def filter_vector_numpy (ma,ds):
nonz = np.apply_along_axis(np.nonzero, 1, ma)
ds_f = np.take(ds, nonz, axis = 1).squeeze()
ma_nonz = ma[ma!=0].reshape(ma.shape[0],-1)
result_vect = np.equal(ds_f, ma_nonz).all(axis=2)
return result_vect
def filter_vector_cupy (ma,ds):
ma = cp.array(ma)
ds = cp.array(ds)
nonz = cp.apply_along_axis(cupy_nonzero, 1, ma)
ds_f = cp.take(ds, nonz, axis = 1).squeeze()
ma_nonz = ma[ma!=0].reshape(ma.shape[0],-1)
result_vect = cp.equal(ds_f, ma_nonz).all(axis=2)
return cp.asnumpy(result_vect)
Speed measurement:
%timeit result1 = filter_standart_loop(ma,ds)
55.8 µs ± 951 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit result2 = filter_vector_numpy(ma,ds)
60.9 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit result3 = filter_vector_cupy(ma,ds)
1.66 ms ± 89.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Vectorisation in fact have almost the same time as standart loop. On BIG array it takes even more time than stadart loop.
CuPy is effective only on BIG array. For example:
21,69 sec - standart loop,
22.28 sec - vector_numpy,
6.17 sec - CuPy