i have a first dataframe :
id ; name ; job
AAA1; reacher; doctor
ZZZ2; pete; nurse
SSS3; wilson; prof
and second dataframe :
pk ; name ; job
AAA1; reacher; Doctor
MOIU8; pete; Nurse
HYA1; wilson; Prof
we suppose that code = pk = id
i want to identify code which exist in dataframe1 and not in the dataframe2 and which code exists in dataframe 2 and not in dataframe 1 with pandas
import pandas as pd
# List1
lst = [['AAA1', 'reacher', "doctor"], ['ZZZ2', 'pete', "nurse"],
['SSS3', 'wilson', "prof"], ]
df1 = pd.DataFrame(lst, columns =['id', 'name', 'job'], dtype = float)
# List1
lst = [['AAA1', 'reacher', "doctor"], ['MOIU8', 'koko', "nurse"],
['HYA1', 'andray', "prof"], ]
df2 = pd.DataFrame(lst, columns =['pk', 'name', 'job'], dtype = float)
CodePudding user response:
You can perform an outer merge
with an indicator. The '_merge' column will tell you if a value if present on the left (df1), right (df2) of both. And, you'll have a quick view of the differences in columns.
df1.merge(df2, left_on='id', right_on='pk', how='outer', indicator=True)
output:
id name_x job_x pk name_y job_y _merge
0 AAA1 reacher doctor AAA1 reacher doctor both
1 ZZZ2 pete nurse NaN NaN NaN left_only
2 SSS3 wilson prof NaN NaN NaN left_only
3 NaN NaN NaN MOIU8 koko nurse right_only
4 NaN NaN NaN HYA1 andray prof right_only
CodePudding user response:
You could try:
#exists in df1 but not in df2
df1_unique = df1[~df1["id"].isin(df2["pk"])]
#exists in df2 but not in df1
df2_unique = df2[~df2["pk"].isin(df1["id"])]
>>> df1_unique
id name job
1 ZZZ2 pete nurse
2 SSS3 wilson prof
>>> df2_unique
pk name job
1 MOIU8 koko nurse
2 HYA1 andray prof