I have two dataframes with 2 similar columns "date" and "id_number" and I want to find all the id_number missing from the second table to compare.
Here's my code:
import pandas as pd
data1 = {'date': ['7/09/22', '7/09/22', '7/09/22'],
'second_column': ['first_value', 'second_value', 'third_value'],
'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589'],
'fourth_column':['first_value', 'second_value', 'third_value'],
}
data2 = {'date': ['7/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22'],
'second_column': ['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],
'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXV6mkjdd589','GGdbkz589', 'BXhshhsd589'],
'fourth_column':['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
print (df1)
print ('\n')
print (df2)
CodePudding user response:
If need compare per date
and id_number
use left join with indicator
parameter:
m = df2.merge(df1, on=['date','id_number'], how='left', indicator=True)['_merge'].ne('both')
df = df2[m]
print (df)
date second_column id_number fourth_column
3 7/09/22 fourth_value BXV6mkjdd589 fourth_value
4 7/09/22 fifth_value GGdbkz589 fifth_value
5 7/09/22 sixth_value BXhshhsd589 sixth_value
Or if need compare only by id_number
use Series.isin
:
df = df2[~df2.id_number.isin(df1.id_number)]
print (df)
date second_column id_number fourth_column
3 7/09/22 fourth_value BXV6mkjdd589 fourth_value
4 7/09/22 fifth_value GGdbkz589 fifth_value
5 7/09/22 sixth_value BXhshhsd589 sixth_value