Is it possible to filter a dataframe using multiples .isin() together?
df1
co | no | ds
A1 | yes| text1
A2 | no | text1
A3 | no | text3
df2
co | no | ds
A1 | yes| text1
A5 | no | text5
A6 | yes| text6
Filter if the entire row is in the other dataframe
df1 = df1[df1['co'].isin(df2['co'])] & df1[df1['no'].isin(df2['no'])] & df1[df1['ds'].isin(df2['ds'])]
df1
co | no | ds
A1 | yes| text1
CodePudding user response:
You can use an inner merge
:
out = df1.merge(df2)
# or to limit to co/no/ds
cols = ['co', 'no', 'ds']
out = df1.merge(df2[cols])
Or, if you have potentially duplicates in df2
, first drop_duplicates
:
out = df1.merge(df2.drop_duplicates())
If you really want to use boolean indexing, then you can take advantage of numpy.logical_and.reduce
to chain multiple conditions without needing to type all masks:
import numpy as np
cols = ['co', 'no', 'ds']
out = df1[np.logical_and.reduce([df1[c].isin(df2[c]) for c in cols])]
CodePudding user response:
Boolean arithmetics (&
) on series are valid only on boolean series. Like what you give for each filtering.
So you just need to apply this &
to the condition. Not to the indexation you get from these booleans series.
In other words
df1[df1['co'].isin(df2['co']) & df1['no'].isin(df2['no']) & df1['ds'].isin(df2['ds'])]
Almost the same line as your non-working one. Except that I apply &
on the condition.
That is, instead of trying to df1[A] & df1[B] & df1[C]
(which doesn't mean much. You could try to intersect those two dataframes df1[A]
, df1[B] and
df1[C]; that costs a bit; but not to apply a
&between them) I do
df1[A & B & C]`.
(Note on mozway's solution: that merge
, indeed, is an elegant way to intersect the dataframes. But that is almost always slower than computing the boolean series index. Plus, in your case, your logic is already the one you need to compute such indexation. Timeit shows that merge
solutions takes 3.59 ms per run, while this one takes 0.83 ms per run)
CodePudding user response:
yes possible see the changes to your code
df1 = df1[(df1['co'].isin(df2['co']) & df1['no'].isin(df2['no']) & df1['ds'].isin(df2['ds']))]
df1
Out[87]:
co no ds
0 A1 yes text1