I am trying to retrieve indexes and columns names of non-matching values between two dfs:
df1:
'A' 'B' 'C'
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
And
df2:
'A' 'B' 'C'
1 1 44 9
2 2 6 10
3 44 7 11
4 4 8 44
I try to use a function that returns a list of tuples in a following way :
Excpected output:
non matching values are indexed : [(3,'A'), (1,'B'), (3,'C')]
What's the better way to do that?
CodePudding user response:
You can use:
s = df1.ne(df2).stack()
out = s[s].index.tolist()
# one-liner version as suggested by @Ynjxsjmh
# out = df1.ne(df2).stack().loc[lambda s: s].index.tolist()
output: [(1, 'B'), (3, 'A'), (4, 'C')]
For a different order (columns first then rows, if this matters):
s = df1.ne(df2).unstack()
out = s[s].swaplevel().index.tolist()
output: [(3, 'A'), (1, 'B'), (4, 'C')]