I need help to find an efficient method (as fast as possible) to transform a numpy array of labels into a numpy array of colors.
Let's take a simple exemple:
A contains labels (ints):
A = [0,45,45,22,0,45,45,22]
B contains all the labels:
B = np.unique(A) = [0,45,22]
C contains RGB values:
C = [[1,0,0], [0,1,0], [0,0,1]]
and the ith element of C
is the color of the ith label in B
. For example, color of label 45
is [0,1,0]
.
According to this, A should be transform to:
[[1,0,0], [0,1,0], [0,1,0], [0,0,1], ...]
I already tried the following piece of code, but it is very slow:
result = np.array([C[np.where(B==x)[0][0]] for x in A])
Does someone knows a more efficient solution ?
Thanks in advance :)
CodePudding user response:
You can use np.unique
's inverse index for that:
import numpy as np
A = np.array([0,45,45,22,0,45,45,22])
C = np.array([[1,0,0], [0,1,0], [0,0,1]])
_, inverse_idx = np.unique(A, return_inverse=True)
result = C[inverse_idx]
# array([[1, 0, 0], [0, 0, 1], [0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 0, 1], [0, 0, 1], [0, 1, 0]])
Important note: np.unique
returns values and indices as a sorted array, so np.unique(A)
gives [0, 22, 35]
, and not [0, 45, 22]
. If you do want to have it in the order in which they appear, it will require an additional operation using the original index of the values of A:
import numpy as np
A = np.array([0,45,45,22,0,45,45,22])
C = np.array([[1,0,0], [0,1,0], [0,0,1]])
_, idx, inverse_idx = np.unique(A, return_index=True, return_inverse=True)
result = C[idx.argsort()[inverse_idx]]
# array([[1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1]])
CodePudding user response:
A = np.array([0,45,45,22,0,45,45,22])
B = np.unique(A)
C = np.array([[1,0,0], [0,1,0], [0,0,1]])
from numba import njit
@njit
def f(arr, labels, colors):
result = np.zeros((len(arr), 3))
for i, label in enumerate(labels):
result[arr==label] = colors[i]
return result
compile the function using a single element from A
:
f(A[:1], B, C)
Now:
result = f(A, B, C)
It requires 9.5367431640625e-05
sec on my machine, vs 3.123283e-04
sec of your solution
I also tried my funtion on an 1'000'000 numbers A
and it required 0.0359194278717041
vs 5.217064619064331
sec of your solution