I have a little bit of a tricky problem here...
Given two arrays A and B
A = np.array([8, 5, 3, 7])
B = np.array([5, 5, 7, 8, 3, 3, 3])
I would like to replace the values in B with the index of that value in A. In this example case, that would look like:
[1, 1, 3, 0, 2, 2, 2]
For the problem I'm working on, A and B contain the same set of values and all of the entries in A are unique.
The simple way to solve this is to use something like:
for idx in range(len(A)):
ind = np.where(B == A[idx])[0]
B_new[ind] = A[idx]
But the B array I'm working with contains almost a million elements and using a for loop gets super slow. There must be a way to vectorize this, but I can't figure it out. The closest I've come is to do something like
np.intersect1d(A, B, return_indices=True)
But this only gives me the first occurrence of each element of A in B. Any suggestions?
CodePudding user response:
The solution of @mozway is good for small array but not for big ones as it runs in O(n**2)
time (ie. quadratic time, see time complexity for more information). Here is a much better solution for big array running in O(n log n)
time (ie. quasi-linear) based on a fast binary search:
unique_values, index = np.unique(A, return_index=True)
result = index[np.searchsorted(unique_values, B)]
CodePudding user response:
Use numpy broadcasting:
np.where(B[:, None]==A)[1]
NB. the values in A
must be unique
Output:
array([1, 1, 3, 0, 2, 2, 2])
CodePudding user response:
Though cant tell exactly what the complexity of this is, I belive it will perform quite well:
A.argsort()[np.unique(B, return_inverse = True)[1]]
array([1, 1, 3, 0, 2, 2, 2], dtype=int64)