Checking if elements in an array exist in a pandas DataFrame-CodePudding

I have a pandas Dataframe and a pandas Series that looks like below.

df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})

  col1 col2 col3
0    a    b    d
1    b    c    f
2    c    e    g
3    d    f    a

df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])

col1    b
col2    g
col3    g
dtype: object

As you can see, the columns of df0 and the indices of df1 are the same. For each index of df1, I want to know if the value at that index exists in the corresponding column of df0. So, df1.col1 is b and we need to look for b only in df0.col1 and check if it exists.

Desired output:

array([True, False, True])

Is there a way to do this without using a loop? Maybe a method native to numpy or pandas?

CodePudding user response：

Pandas' pandas.DataFrame.eq method is probably the simplest.

df0.eq(df1).any()

col1     True
col2    False
col3     True
dtype: bool

CodePudding user response：

Using numpy

You can broadcast df1 to check against df0:

np.any(df1[None, :] == df0, axis=0)
# col1     True
# col2    False
# col3     True
# dtype: bool

Note that this assumes df1.index and df0.columns have the same order. If not, reindex first:

np.any(df1.reindex(df0.columns)[None, :] == df0, axis=0)

Using pandas

Use apply to check whether a given df1 value isin the corresponding col of df0:

df0.apply(lambda col: col.isin([df1[col.name]])).any()
# col1     True
# col2    False
# col3     True
# dtype: bool

CodePudding user response：

You can use apply instead of loop.

Try this:

df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})
df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])

df0.apply(lambda x : df1[x.name] in x.values) # for example x <-> 'col1' check this -> 'b' in ['a','b','c','d']
# col1     True    <-> 'b' in ['a','b','c','d']
# col2    False    <-> 'g' in ['b','c','e','f']
# col3     True    <-> 'g' in ['d','f','g','a']
# dtype: bool


df0.apply(lambda x : df1[x.name] in x.values).tolist()
# [True, False, True]

CodePudding user response：

import pandas as pd
array=[]
df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})
df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])
for i in range(1,4):
    col = 'col' str(i)
    array.append(df0[col].str.contains(df1[col]).any())
print(array)

CodePudding user response：

You can make use of broadcasting:

(df0 == df1).any().values

It also works with NumPy ndarrays:

assert (df0.columns == df1.columns).all()

(df0.values == df1.values).any(axis=0)

Output:

array([ True, False,  True])

CodePudding user response：

If you'd like a quick one liner using list comprehension:

[df1[i] in df0[i].unique() for i in df1.index]

And if it needs to be an array:

np.array([df1[i] in df0[i].unique() for i in df1.index])

The output is:

array([ True, False, True])