identify present element in dataframe pandas? [duplicate]-CodePudding

i have a first dataframe :

 id ;  name ; job
 AAA1; reacher; doctor
 ZZZ2; pete; nurse
 SSS3; wilson; prof

and second dataframe :

 pk ;  name ; job
 AAA1; reacher; Doctor
 MOIU8; pete; Nurse
 HYA1; wilson; Prof

we suppose that code = pk = id

i want to identify code which exist in dataframe1 and not in the dataframe2 and which code exists in dataframe 2 and not in dataframe 1 with pandas

import pandas as pd 
    


# List1 
lst = [['AAA1', 'reacher', "doctor"], ['ZZZ2', 'pete', "nurse"],
       ['SSS3', 'wilson', "prof"], ]
    
df1 = pd.DataFrame(lst, columns =['id', 'name', 'job'], dtype = float)

# List1 
lst = [['AAA1', 'reacher', "doctor"], ['MOIU8', 'koko', "nurse"],
       ['HYA1', 'andray', "prof"], ]
    
df2 = pd.DataFrame(lst, columns =['pk', 'name', 'job'], dtype = float)

CodePudding user response：

You can perform an outer merge with an indicator. The '_merge' column will tell you if a value if present on the left (df1), right (df2) of both. And, you'll have a quick view of the differences in columns.

df1.merge(df2, left_on='id', right_on='pk', how='outer', indicator=True)

output:

     id   name_x   job_x     pk   name_y   job_y      _merge
0  AAA1  reacher  doctor   AAA1  reacher  doctor        both
1  ZZZ2     pete   nurse    NaN      NaN     NaN   left_only
2  SSS3   wilson    prof    NaN      NaN     NaN   left_only
3   NaN      NaN     NaN  MOIU8     koko   nurse  right_only
4   NaN      NaN     NaN   HYA1   andray    prof  right_only

CodePudding user response：

You could try:

#exists in df1 but not in df2
df1_unique = df1[~df1["id"].isin(df2["pk"])]

#exists in df2 but not in df1
df2_unique = df2[~df2["pk"].isin(df1["id"])]

>>> df1_unique
     id    name    job
1  ZZZ2    pete  nurse
2  SSS3  wilson   prof

>>> df2_unique
      pk    name    job
1  MOIU8    koko  nurse
2   HYA1  andray   prof