i have two df and i wanna check for the id if the value differs in both df if so i need to print those.
example:
df1 = |id |check_column1|
|1|abc|
|1|bcd|
|2|xyz|
|2|mno|
|2|mmm|
df2 =
|id |check_column2|
|1|bcd|
|1|abc|
|2|xyz|
|2|mno|
|2|kkk|
here the output should be just |2|mmm|kkk| but i am getting whole table as output since index are different
This is what i did
output = pd.merge(df1,df2, on= ['id'], how='inner')
event4 = output[output.apply(lambda x: x['check_column1'] != x['check_column2'], axis=1)]
CodePudding user response:
Idea is sorting values per id
in both columns and join with helper counter by GroupBy.cumcount
, then is possible filtering not matched rows:
df1 = df1.sort_values(['id','check_column1'])
df2 = df2.sort_values(['id','check_column2'])
df = pd.merge(df1,df2, left_on= ['id',df1.groupby('id').cumcount()],
right_on= ['id',df2.groupby('id').cumcount()])
output = df[df['check_column1'] != df['check_column2']]
print (output)
id key_1 check_column1 check_column2
2 2 0 mmm kkk
CodePudding user response:
mask = np.where((df1['id'] == df2['id']) & (df1['check_column1'] == df2['check_column2']), True, False)
output = df2[mask]
CodePudding user response:
You can use np.where to achieve this.
df1 = pd.DataFrame({'id':[1,1,2,2,2],'check_column1':['abc','bcd','xyz','mno','mmm']})
df2 = pd.DataFrame({'id':[1,1,2,2,2],'check_column2':['bcd','abc','xyz','mno','kkk']})
output = pd.merge(df1,df2, on= ['id'], how='inner')
event4 = np.where(output['check_column1']!=output['check_column2'],output[['id','check_column1']],output[['id','check_column2']])
Output:
array([[2, 'mmm'],
[2, 'kkk']], dtype=object)