Numpy: How can I use last dimension of an array as a value?-CodePudding

I have a 3-D array with shape [1080, 1920, 4], the last axis stands for RGBA channels of a picture, and I have a dict mapping from RGBA values to int, I want to use np.vectorize to convert this array to a 2-D array with shape [1080, 1920], how can I pass the array as a 2-D array with last dimension is a list to the vectorized function?

array = [[[112,  25, 235, 255],
        [112,  25, 235, 255],
        [112,  25, 235, 255],
        ...,
        [ 35,  35,  30, 255],
        [ 41,  40,  37, 255],
        [ 39,  41,  37, 255]]
        ...,
        [ 35,  35,  30, 255],
        [ 41,  40,  37, 255],
        [ 39,  41,  37, 255]]]
dic = {(35,  35,  30, 255): 1, (41,  40,  37, 255): 2}
np.vectorize(lambda x: dic.get(tuple(x)))()

what should I pass into the last ()

CodePudding user response：

One way using numpy.apply_along_axis:

# Data with (3, 3, 4)
array([[[112,  25, 235, 255],
        [112,  25, 235, 255],
        [112,  25, 235, 255]],

       [[ 35,  35,  30, 255],
        [ 41,  40,  37, 255],
        [ 39,  41,  37, 255]],

       [[ 35,  35,  30, 255],
        [ 41,  40,  37, 255],
        [ 39,  41,  37, 255]]])

dic = {(112, 25, 235, 255): 0,
 (35, 35, 30, 255): 1,
 (41, 40, 37, 255): 2,
 (39, 41, 37, 255): 3}

np.apply_along_axis(lambda x: dic.get(tuple(x)), 2, array)

Output:

array([[0, 0, 0],
       [1, 2, 3],
       [1, 2, 3]])

CodePudding user response：

vectorize normally passes scalar values to the function. But you can specify a signature:

In [55]: f=np.vectorize(lambda x: dic.get(tuple(x)),signature='(n)->()')

Using @Chris example values:

In [56]: f(arr)
Out[56]: 
array([[0, 0, 0],
       [1, 2, 3],
       [1, 2, 3]])

But vectorize warns that signature use is slower:

In [57]: timeit f(arr)
178 µs ± 6.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Using apply... as @Chris does:

In [58]: timeit np.apply_along_axis(lambda x: dic.get(tuple(x)),2, arr)
131 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

and a list comprehension on a the 'flattened' array:

In [59]: timeit np.array([dic.get(tuple(x)) for x in arr.reshape(-1,4)]).reshape(ar
    ...: r.shape[:2])
35.3 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The speed disadvantage of vectorize disappears with larger arrays, though I don't know if that applies to the signature case or not.