How can I transform a numpy array of labels to an array of colors, from correspondance given by thir-CodePudding

I need help to find an efficient method (as fast as possible) to transform a numpy array of labels into a numpy array of colors.

Let's take a simple exemple:

A contains labels (ints):
```
A = [0,45,45,22,0,45,45,22]
```
B contains all the labels:
```
B = np.unique(A) = [0,45,22]
```
C contains RGB values:
```
C = [[1,0,0], [0,1,0], [0,0,1]]
```

and the ith element of C is the color of the ith label in B. For example, color of label 45 is [0,1,0].

According to this, A should be transform to:

[[1,0,0], [0,1,0], [0,1,0], [0,0,1], ...]

I already tried the following piece of code, but it is very slow:

result = np.array([C[np.where(B==x)[0][0]] for x in A])

Does someone knows a more efficient solution ?

Thanks in advance :)

CodePudding user response：

You can use np.unique's inverse index for that:

import numpy as np

A = np.array([0,45,45,22,0,45,45,22])
C = np.array([[1,0,0], [0,1,0], [0,0,1]])
_, inverse_idx = np.unique(A, return_inverse=True)

result = C[inverse_idx]
# array([[1, 0, 0], [0, 0, 1], [0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 0, 1], [0, 0, 1], [0, 1, 0]])

Important note: np.unique returns values and indices as a sorted array, so np.unique(A) gives [0, 22, 35], and not [0, 45, 22]. If you do want to have it in the order in which they appear, it will require an additional operation using the original index of the values of A:

import numpy as np

A = np.array([0,45,45,22,0,45,45,22])
C = np.array([[1,0,0], [0,1,0], [0,0,1]])
_, idx, inverse_idx = np.unique(A, return_index=True, return_inverse=True)

result = C[idx.argsort()[inverse_idx]]
# array([[1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1]])

CodePudding user response：

A = np.array([0,45,45,22,0,45,45,22])
B = np.unique(A)
C = np.array([[1,0,0], [0,1,0], [0,0,1]])


from numba import njit

@njit
def f(arr, labels, colors):
    result = np.zeros((len(arr), 3))
    for i, label in enumerate(labels):
        result[arr==label] = colors[i]
    return result

compile the function using a single element from A:

f(A[:1], B, C)

Now:

result = f(A, B, C)

It requires 9.5367431640625e-05 sec on my machine, vs 3.123283e-04 sec of your solution

I also tried my funtion on an 1'000'000 numbers A and it required 0.0359194278717041 vs 5.217064619064331 sec of your solution