i have 2 dataframe like below
df1 :
col 1 | col 2 |
---|---|
val1 | val2 |
xyz | abc |
ooo | ppp |
df2:
col 1 | col 2 |
---|---|
val1 | mmmm |
xyz | abc |
ooo | sssss |
so here the first column is the same but 2nd is different. But i want to compare both the columns at a time get the difference based on their combination.
output:
col 1 | col 2 |
---|---|
val1 | mmmm |
ooo | sssss |
CodePudding user response:
You can try merge then filter
out = (pd.merge(df1, df2, on='col 1', how='outer')
.query('`col 2_x` != `col 2_y`'))
print(out)
col 1 col 2_x col 2_y
0 val1 val2 mmmm
2 ooo ppp sssss
CodePudding user response:
IIUC, and based on your expected output, you are looking to return all rows of df2
where df1['col 2']
does not match df1['col 2']
. If so, you can simply use:
df2[df2['col 2'] != df1['col 2']]
col 1 col 2
0 val1 mmmm
2 ooo sssss
If you want col 2
from both dfs here, you could do:
df1[df1['col 2'] != df2['col 2']].merge(df2, on='col 1', how='left', suffixes=['_df1','_df2'])
col 1 col 2_df1 col 2_df2
0 val1 val2 mmmm
1 ooo ppp sssss
Or if you want to preserve the index:
df1[df1['col 2'] != df2['col 2']].reset_index().merge(df2, on='col 1', how='left', suffixes=['_df1','_df2']).set_index('index')
col 1 col 2_df1 col 2_df2
index
0 val1 val2 mmmm
2 ooo ppp sssss