Find the mismatch between two python dataframe-CodePudding

I have two dataframe below:

df1>>

df2>>

I need to compare the two dataframe and find the id's in col1 where col2 value is a mismatch i.e. for the sample dataset 3 should be the answer.

Please note in the actual problem I have 10^6 number of unique id's in col 1 hence if-else ladder or running loop at each unique col1 key level is taking a very long execution time. I am looking for an efficient method to do this. I really appreciate any help you can provide.

CodePudding user response：

df1 = pd.DataFrame({"col1": [1, 1,2,2,3,3] ,"col2": ['A','B','C','D','E','F']})
df2 = pd.DataFrame({"col1": [1, 1,2,2,3,3] ,"col2": ['A','B','D','C','E','K']})
df3 = df1.merge(df2, indicator=True, how='outer').query('_merge != "both"').drop('_merge', 1)
df3

CodePudding user response：

You can compare dataframes with pd.DataFrame.compare.

>>> df1 = pd.DataFrame({"col1": [1, 1, 2, 2, 3, 3], "col2": list("ABCDEF")})
>>> df2 = pd.DataFrame({"col1": [1, 1, 2, 2, 3, 3], "col2": list("ABDCEK")})
>>> df1.compare(df2)
  col2      
  self other
2    C     D
3    D     C
5    F     K