Home > Back-end >  cross check if two df have different values and print any if there
cross check if two df have different values and print any if there

Time:11-23

i have two df and i wanna check for the id if the value differs in both df if so i need to print those.

example:

df1 = |id |check_column1|
      |1|abc|
      |1|bcd|
      |2|xyz|
      |2|mno|
      |2|mmm|
df2 = 
      |id |check_column2|
      |1|bcd|
      |1|abc|
      |2|xyz|
      |2|mno|
      |2|kkk|

here the output should be just |2|mmm|kkk| but i am getting whole table as output since index are different

This is what i did

output = pd.merge(df1,df2, on= ['id'], how='inner')

event4 = output[output.apply(lambda x: x['check_column1'] != x['check_column2'], axis=1)]

CodePudding user response:

Idea is sorting values per id in both columns and join with helper counter by GroupBy.cumcount, then is possible filtering not matched rows:

df1 = df1.sort_values(['id','check_column1'])
df2 = df2.sort_values(['id','check_column2'])
    
df = pd.merge(df1,df2, left_on= ['id',df1.groupby('id').cumcount()], 
                       right_on= ['id',df2.groupby('id').cumcount()])

output = df[df['check_column1'] != df['check_column2']]
print (output)
   id  key_1 check_column1 check_column2
2   2      0           mmm           kkk

CodePudding user response:

mask = np.where((df1['id'] == df2['id']) & (df1['check_column1'] == df2['check_column2']), True, False)

output = df2[mask]

CodePudding user response:

You can use np.where to achieve this.

df1 = pd.DataFrame({'id':[1,1,2,2,2],'check_column1':['abc','bcd','xyz','mno','mmm']})
df2 = pd.DataFrame({'id':[1,1,2,2,2],'check_column2':['bcd','abc','xyz','mno','kkk']})

output = pd.merge(df1,df2, on= ['id'], how='inner')
event4 = np.where(output['check_column1']!=output['check_column2'],output[['id','check_column1']],output[['id','check_column2']])

Output:

array([[2, 'mmm'],
       [2, 'kkk']], dtype=object)
  • Related