Home > Net >  Checking column-wise if elements in an array exist in another array
Checking column-wise if elements in an array exist in another array

Time:09-21

So I have two arrays that look like below:

x1 = np.array([['a','b','c'],['d','a','b'],['c','a,c','c']])
x2 = np.array(['d','c','d'])

I want to check if each element in x2 exists in a corresponding column in x1. So I tried:

print((x1==x2).any(axis=0))
#array([ True, False, False])

Note that x2[1] in x1[2,1] == True. The problem is, sometimes an element we're looking for is inside an element in x1 (where it can be identified if we split by comma). So my desired output is:

array([ True,  True, False])

Is there a way to do it using a numpy (or pandas) native method?

CodePudding user response:

You can vectorize a function to broadcast x2 in x1.split(','):

@np.vectorize
def f(a, b):
    return b in a.split(',')

f(x1, x2).any(axis=0)
# array([ True,  True, False])

Note that "vectorize" is a misnomer. This isn't true vectorization, just a convenient way to broadcast a custom function.


Since you mentioned pandas in parentheses, another option is to apply a splitting/membership function to the columns of df = pd.DataFrame(x1).

However, the numpy function is significantly faster:

f(x1, x2).any(axis=0)         # 24.2 µs ± 2.8 µs
df.apply(list_comp).any()     # 913 µs ± 12.1 µs
df.apply(combine_in).any()    # 1.8 ms ± 104 µs
df.apply(expand_eq_any).any() # 3.28 ms ± 751 µs
# use a list comprehension to do the splitting and membership checking:
def list_comp(col):
    return [x2[col.name] in val.split(',') for val in col]
# split the whole column and use `combine` to check `x2 in x1`
def combine_in(col):
    return col.str.split(',').combine(x2[col.name], lambda a, b: b in a)
# split the column into expanded columns and check the expanded rows for matches
def expand_eq_any(col):
    return col.str.split(',', expand=True).eq(x2[col.name]).any(axis=1)
  • Related