I have a pandas Dataframe and a pandas Series that looks like below.
df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})
col1 col2 col3
0 a b d
1 b c f
2 c e g
3 d f a
df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])
col1 b
col2 g
col3 g
dtype: object
As you can see, the columns of df0
and the indices of df1
are the same. For each index of df1
, I want to know if the value at that index exists in the corresponding column of df0
. So, df1.col1
is b
and we need to look for b
only in df0.col1
and check if it exists.
Desired output:
array([True, False, True])
Is there a way to do this without using a loop? Maybe a method native to numpy or pandas?
CodePudding user response:
Pandas' pandas.DataFrame.eq
method is probably the simplest.
df0.eq(df1).any()
col1 True
col2 False
col3 True
dtype: bool
CodePudding user response:
Using numpy
You can broadcast df1
to check against df0
:
np.any(df1[None, :] == df0, axis=0)
# col1 True
# col2 False
# col3 True
# dtype: bool
Note that this assumes df1.index
and df0.columns
have the same order. If not, reindex
first:
np.any(df1.reindex(df0.columns)[None, :] == df0, axis=0)
Using pandas
Use apply
to check whether a given df1
value isin
the corresponding col
of df0
:
df0.apply(lambda col: col.isin([df1[col.name]])).any()
# col1 True
# col2 False
# col3 True
# dtype: bool
CodePudding user response:
You can use apply
instead of loop
.
Try this:
df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})
df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])
df0.apply(lambda x : df1[x.name] in x.values) # for example x <-> 'col1' check this -> 'b' in ['a','b','c','d']
# col1 True <-> 'b' in ['a','b','c','d']
# col2 False <-> 'g' in ['b','c','e','f']
# col3 True <-> 'g' in ['d','f','g','a']
# dtype: bool
df0.apply(lambda x : df1[x.name] in x.values).tolist()
# [True, False, True]
CodePudding user response:
import pandas as pd
array=[]
df0 = pd.DataFrame({'col1':['a','b','c','d'],'col2':['b','c','e','f'],'col3':['d','f','g','a']})
df1 = pd.Series(['b','g','g'], index=['col1','col2','col3'])
for i in range(1,4):
col = 'col' str(i)
array.append(df0[col].str.contains(df1[col]).any())
print(array)
CodePudding user response:
You can make use of broadcasting:
(df0 == df1).any().values
It also works with NumPy ndarrays:
assert (df0.columns == df1.columns).all()
(df0.values == df1.values).any(axis=0)
Output:
array([ True, False, True])
CodePudding user response:
If you'd like a quick one liner using list comprehension:
[df1[i] in df0[i].unique() for i in df1.index]
And if it needs to be an array:
np.array([df1[i] in df0[i].unique() for i in df1.index])
The output is:
array([ True, False, True])