I have two dataframes, they both have the same columns. I want to compare them both and find for each two rows that are different, on which column they have different values
my dataframes are as follow:
the column A is unique key both dataframes share
df1
A B C D E
0 V 10 5 18 20
1 W 9 18 11 13
2 X 8 7 12 5
3 Y 7 9 7 8
4 Z 6 5 3 90
df2
A B C D E
0 V 30 5 18 20
1 W 9 18 11 9
2 X 8 7 12 5
3 Y 36 9 7 8
4 Z 6 5 3 90
expected result:
df3
A key
0 V B
1 W E
3 Y B
What i've tried so far is:
df3 = df1.merge(df2, on=['A', 'B', 'C', 'D', 'E'], how='outer', indicator=True)
df3 = df3[df3._merge != 'both'] #to retrieve only the rows where there's a difference spotted
This is what I get for df3
A B C D E _merge
0 V 10 5 18 20 left_only
1 W 9 18 11 13 left_only
3 Y 7 9 7 8 left_only
5 V 30 5 18 20 right_only
6 W 9 18 11 9 right_only
8 Y 36 9 7 8 right_only
How can I achieve the expected result ?
CodePudding user response:
In your case you can set the index first then eq
s = df1.set_index('A').eq(df2.set_index('A'))
s.mask(s).stack().reset_index()
Out[442]:
A level_1 0
0 V B False
1 W E False
2 Y B False
CodePudding user response:
You can find the differences between the two frames and use idxmax
with axis=1
to get the differing column:
diff = df1.set_index("A") - df2.set_index("A")
result = diff[diff.ne(0)].abs().idxmax(1).dropna()
>>> result
A
V B
W E
Y B
dtype: object