Home > Enterprise >  Numpy find identical element in two arrays
Numpy find identical element in two arrays

Time:05-31

Suppose I have an array a and b, how can I find the identical element in both arrays?

a = np.array([[262.5, 262.5, 45],
              [262.5, 262.5, 15],
              [262.5, 187.5, 45],
              [262.5, 187.5, 15],
              [187.5, 262.5, 45],
              [187.5, 262.5, 15],
              [187.5, 187.5, 45],
              [187.5, 187.5, 15]])

b = np.array([[262.5, 262.5, 45],
              [262.5, 262.5, 15],
              [3,3,5],
              [5,5,7],
              [8,8,9]])

I tried the code below, but the output is not what I want, can anyone tell me what is wrong with this code? or is there any other way to do it?

out = [x[(x == b[:,None]).all(1).any(0)] for x in a]

The output I want is:

array[[262.5, 262.5, 45],
      [262.5, 262.5, 15]]

CodePudding user response:

a[np.all([np.isin(ai, b) for ai in  a], axis=1)]

or also:

b[np.all([np.isin(bi, a) for bi in  b], axis=1)]

CodePudding user response:

If you are not tied to using np all the way (which I think is the case, seeing the list comprehension) - you can do a set intersection

x = set(map(tuple, a)).intersection(set(map(tuple, b)))
print(x)
# {(262.5, 262.5, 15.0), (262.5, 262.5, 45.0)}

You can convert this back to a np.ndarray by

xarr = np.array(list(x)) 
print(xarr)
# array([[262.5, 262.5,  45. ],
#       [262.5, 262.5,  15. ]])

CodePudding user response:

It is not clear if you want the first contiguous block or not. Let assume not, and that you want to retrieve all rows of same index in both arrays and for which all elements are equal:

import numpy as np

a = np.array(
    [
        [1, 1, 1],
        [2, 2, 2],
        [3, 3, 3],
        [4, 4, 4],
        [5, 5, 5],
        [6, 6, 6],
    ]
)

b = np.array(
    [
        [1, 1, 1],
        [2, 2, 2],
        [0, 0, 0],
        [0, 0, 0],
        [5, 5, 5],
    ]
)

expected = np.array(
    [
        [1, 1, 1],
        [2, 2, 2],
        [5, 5, 5],
    ]
)

First method is using a for loop, but might not be efficient:

out = np.array([x for x, y in zip(a, b) if np.all(x == y)])
assert np.all(out == expected)

Second method is vectorized and so much more efficient, you just need to crop your arrays beforehand because they don't have the same length (zip does that silently):

num_rows = min(a.shape[0], b.shape[0])
a_ = a[:num_rows]
b_ = b[:num_rows]

rows_mask = np.all(a_ == b_, axis=-1)
out = a_[rows_mask, :]

assert np.all(out == expected)
  • Related