Home > Enterprise >  Compre two dataframes on multiple columns
Compre two dataframes on multiple columns

Time:12-01

I have two dataframes, they both have the same columns. I want to compare them both and find for each two rows that are different, on which column they have different values

my dataframes are as follow:

the column A is unique key both dataframes share

df1

    A   B   C   D   E
0   V   10  5   18  20
1   W   9   18  11  13
2   X   8   7   12  5
3   Y   7   9   7   8
4   Z   6   5   3   90

df2

    A   B   C   D   E
0   V   30  5  18  20
1   W   9   18  11  9
2   X   8   7   12  5
3   Y   36   9   7  8
4   Z   6   5   3   90

expected result:

df3

    A  key
0   V   B  
1   W   E 
3   Y   B

What i've tried so far is:

df3 = df1.merge(df2, on=['A', 'B', 'C', 'D', 'E'], how='outer', indicator=True)
df3 = df3[df3._merge != 'both'] #to retrieve only the rows where there's a difference spotted

This is what I get for df3

    A   B   C   D   E   _merge
0   V   10  5   18  20  left_only
1   W   9   18  11  13  left_only
3   Y   7   9   7   8   left_only
5   V   30  5  18  20   right_only
6   W   9   18  11  9   right_only
8   Y   36   9   7  8   right_only

How can I achieve the expected result ?

CodePudding user response:

In your case you can set the index first then eq

s = df1.set_index('A').eq(df2.set_index('A'))
s.mask(s).stack().reset_index()
Out[442]: 
   A level_1      0
0  V       B  False
1  W       E  False
2  Y       B  False

CodePudding user response:

You can find the differences between the two frames and use idxmax with axis=1 to get the differing column:

diff = df1.set_index("A") - df2.set_index("A")
result = diff[diff.ne(0)].abs().idxmax(1).dropna()

>>> result
A
V    B
W    E
Y    B
dtype: object
  • Related