I have two dataframes in Pandas:
name | number |
---|---|
apple | 102 |
apple | 104 |
banana | 103 |
banana | 105 |
and the other one (that's the real and true dataset of comparison)
name | number |
---|---|
apple | 102 |
banana | 105 |
I want to create another column to the first dataframe (called match) and put 0 if the number from first dataframe IS NOT in the second (and real) dataframe and 1 if it is. So basically, a left join, but to add a new column for seeing if the number is present in the other dataset.
The output should be:
name | number | match |
---|---|---|
apple | 102 | 1 |
apple | 104 | 0 |
banana | 103 | 0 |
banana | 105 | 1 |
CodePudding user response:
You can use .isin()
to check if the element appears in another series and .astype()
to convert True/False into 1/0:
df1['match'] = df1['number'].isin(df2['number']).astype(int)
Output:
name number match
0 apple 102 1
1 apple 104 0
2 banana 103 0
3 banana 105 1
EDIT:
If not only number should be matched, but name as well, you can use .isin()
with .index
:
df1['match'] = (df1.set_index(['name', 'number']).index
.isin(df2.set_index(['name', 'number']).index)
.astype(int)
)
CodePudding user response:
You can use a left merge
with an assigned column that you fillna
afterwards:
df1.merge(df2.assign(match=1), how='left').fillna({'match': 0}, downcast='infer')
Another idea would be to use the indicator
parameter:
(df1.merge(df2, how='left', indicator='match')
.assign(match=lambda d: d['match'].eq('both').astype(int))
)
output:
name number match
0 apple 102 1
1 apple 104 0
2 banana 103 0
3 banana 105 1
CodePudding user response:
for i in range(len(df)):
if df.loc[i, 'name'] in df_2.name.values and df.loc[i, 'number'] in df_2.number.values:
df.loc[i, 'match'] = 1
else:
df.loc[i, 'match'] = 0