Home > database >  Index of Position of Values from B in A
Index of Position of Values from B in A

Time:05-18

I have a little bit of a tricky problem here...

Given two arrays A and B

A = np.array([8, 5, 3, 7])
B = np.array([5, 5, 7, 8, 3, 3, 3])

I would like to replace the values in B with the index of that value in A. In this example case, that would look like:

[1, 1, 3, 0, 2, 2, 2]

For the problem I'm working on, A and B contain the same set of values and all of the entries in A are unique.

The simple way to solve this is to use something like:

for idx in range(len(A)):
    ind = np.where(B == A[idx])[0]
    B_new[ind] = A[idx]

But the B array I'm working with contains almost a million elements and using a for loop gets super slow. There must be a way to vectorize this, but I can't figure it out. The closest I've come is to do something like

np.intersect1d(A, B, return_indices=True)

But this only gives me the first occurrence of each element of A in B. Any suggestions?

CodePudding user response:

The solution of @mozway is good for small array but not for big ones as it runs in O(n**2) time (ie. quadratic time, see time complexity for more information). Here is a much better solution for big array running in O(n log n) time (ie. quasi-linear) based on a fast binary search:

unique_values, index = np.unique(A, return_index=True)
result = index[np.searchsorted(unique_values, B)]

CodePudding user response:

Use numpy broadcasting:

np.where(B[:, None]==A)[1]

NB. the values in A must be unique

Output:

array([1, 1, 3, 0, 2, 2, 2])

CodePudding user response:

Though cant tell exactly what the complexity of this is, I belive it will perform quite well:

A.argsort()[np.unique(B, return_inverse = True)[1]]
array([1, 1, 3, 0, 2, 2, 2], dtype=int64)
  • Related