Easier way of deleting rows in pandas dataframe based on condition from another dataframe-CodePudding

Suppose I have two dataframes

df1 = pd.DataFrame({"A" : [1,1,2,5],
             "B" : [1,1,4,5],
             "C" : ["Adam","Bella","Charlie","Dan"]})

df2 = pd.DataFrame({"A" : [1,1,3,5],
             "B" : [1,3,6,5]})

and I want to delete the rows in df1 that have the same values of A and B with df2

I do this by

for i, row_1 in df1.iterrows():

    for j, row_2 in df2.iterrows():
        
        if row_1["A"] == row_2["A"] and row_1["B"] == row_2["B"]:
            index = i
            df1.drop([index], axis=0, inplace=False)

which resulted in, as intended

    A   B   C
2   2   4   Charlie

I was wondering if there was a much easier/faster way to do this especially if the data frame is large then it is not ideal to iterate over all the rows.

CodePudding user response：

You can left-merge with the indicator parameter to flag the rows that match; then query to filter the rows that come only from df1:

out = df1.merge(df2, how='left', indicator=True).query('_merge=="left_only"').drop(columns=['_merge'])

Output:

   A  B        C
2  2  4  Charlie

CodePudding user response：

Here is another way:

df1.loc[~df1.set_index(['A','B']).index.isin(df2.to_records(index=False).tolist())]

CodePudding user response：

#!pip install siuba
from siuba import anti_join

anti_join(df1, df2, on = ['A', 'B'])

   A  B        C
2  2  4  Charlie

if you want to anti_join on all shared columns:

anti_join(df1, df2)
 
   A  B        C
2  2  4  Charlie