Home > Net >  I have two df which i want the values which are not present in df2 when compared
I have two df which i want the values which are not present in df2 when compared

Time:08-01

i have 2 dataframe like below

df1 :

col 1 col 2
val1 val2
xyz abc
ooo ppp

df2:

col 1 col 2
val1 mmmm
xyz abc
ooo sssss

so here the first column is the same but 2nd is different. But i want to compare both the columns at a time get the difference based on their combination.

output:

col 1 col 2
val1 mmmm
ooo sssss

CodePudding user response:

You can try merge then filter

out = (pd.merge(df1, df2, on='col 1', how='outer')
       .query('`col 2_x` != `col 2_y`'))
print(out)

  col 1 col 2_x col 2_y
0  val1    val2    mmmm
2   ooo     ppp   sssss

CodePudding user response:

IIUC, and based on your expected output, you are looking to return all rows of df2 where df1['col 2'] does not match df1['col 2']. If so, you can simply use:

df2[df2['col 2'] != df1['col 2']]

  col 1  col 2
0  val1   mmmm
2   ooo  sssss

If you want col 2 from both dfs here, you could do:

df1[df1['col 2'] != df2['col 2']].merge(df2, on='col 1', how='left', suffixes=['_df1','_df2'])

  col 1 col 2_df1 col 2_df2
0  val1      val2      mmmm
1   ooo       ppp     sssss

Or if you want to preserve the index:

df1[df1['col 2'] != df2['col 2']].reset_index().merge(df2, on='col 1', how='left', suffixes=['_df1','_df2']).set_index('index')

      col 1 col 2_df1 col 2_df2
index                          
0      val1      val2      mmmm
2       ooo       ppp     sssss
  • Related