Home > front end >  Compare two datafarmes in two columns and get the difference
Compare two datafarmes in two columns and get the difference

Time:12-20

Lets say i have a dataframe like this one:

df1:
     col1          col2
0    data1         math
1    data1         math2
2    data2         math
3    data3         math
4    data4         math2

df2:
     col1          col2
0    data1         math
1    data1         math2
2    data1         math3
3    data2         math2
4    data3         math
5    data4         math2
6    data4         math3

how can i compare these two dataframes based on col1 and col2 and get the difference (remove all the rows that match with df1) and have a dataframe like this one:

     col1          col2
0    data1         math3
1    data2         math2
2    data4         math3

I tried this one but it is not working:

df3 = df2[~(df2['col2'].isin(df1['col2']))].reset_index(drop=True)

CodePudding user response:

Your solution should be changed with compare MultiIndex or tuples:

df3 = df2[~df2.set_index(['col1','col2']).index.isin(df1.set_index(['col1','col2']).index)].reset_index(drop=True)

df3 = df2[~df2[['col1','col2']].apply(tuple, 1).isin(df1[['col1','col2']].apply(tuple, 1))].reset_index(drop=True)

CodePudding user response:

You can perform a merge with indicator=True and keep only the right_only rows:

(df1.merge(df2, on=['col1', 'col2'], how='outer', indicator=True)
    .query('_merge == "right_only"')
    .drop(columns='_merge').reset_index(drop=True)
)

output:

    col1   col2
0  data1  math3
1  data2  math2
2  data4  math3
  • Related