Suppose I have two dataframes
df1 = pd.DataFrame({"A" : [1,1,2,5],
"B" : [1,1,4,5],
"C" : ["Adam","Bella","Charlie","Dan"]})
df2 = pd.DataFrame({"A" : [1,1,3,5],
"B" : [1,3,6,5]})
and I want to delete the rows in df1
that have the same values of A
and B
with df2
I do this by
for i, row_1 in df1.iterrows():
for j, row_2 in df2.iterrows():
if row_1["A"] == row_2["A"] and row_1["B"] == row_2["B"]:
index = i
df1.drop([index], axis=0, inplace=False)
which resulted in, as intended
A B C
2 2 4 Charlie
I was wondering if there was a much easier/faster way to do this especially if the data frame is large then it is not ideal to iterate over all the rows.
CodePudding user response:
You can left-merge
with the indicator
parameter to flag the rows that match; then query
to filter the rows that come only from df1
:
out = df1.merge(df2, how='left', indicator=True).query('_merge=="left_only"').drop(columns=['_merge'])
Output:
A B C
2 2 4 Charlie
CodePudding user response:
Here is another way:
df1.loc[~df1.set_index(['A','B']).index.isin(df2.to_records(index=False).tolist())]
CodePudding user response:
#!pip install siuba
from siuba import anti_join
anti_join(df1, df2, on = ['A', 'B'])
A B C
2 2 4 Charlie
if you want to anti_join
on all shared columns:
anti_join(df1, df2)
A B C
2 2 4 Charlie