How to combine two dataframes and put 0 and 1 if there are matches between columns?-CodePudding

I have two dataframes in Pandas:

name	number
apple	102
apple	104
banana	103
banana	105

and the other one (that's the real and true dataset of comparison)

name	number
apple	102
banana	105

I want to create another column to the first dataframe (called match) and put 0 if the number from first dataframe IS NOT in the second (and real) dataframe and 1 if it is. So basically, a left join, but to add a new column for seeing if the number is present in the other dataset.

The output should be:

name	number	match
apple	102	1
apple	104	0
banana	103	0
banana	105	1

CodePudding user response：

You can use .isin() to check if the element appears in another series and .astype() to convert True/False into 1/0:

df1['match'] = df1['number'].isin(df2['number']).astype(int)

Output:

      name  number  match
0    apple     102      1
1    apple     104      0
2   banana     103      0
3   banana     105      1

EDIT:

If not only number should be matched, but name as well, you can use .isin() with .index:

df1['match'] = (df1.set_index(['name', 'number']).index
                   .isin(df2.set_index(['name', 'number']).index)
                   .astype(int)
               )

CodePudding user response：

You can use a left merge with an assigned column that you fillna afterwards:

df1.merge(df2.assign(match=1), how='left').fillna({'match': 0}, downcast='infer')

Another idea would be to use the indicator parameter:

(df1.merge(df2, how='left', indicator='match')
    .assign(match=lambda d: d['match'].eq('both').astype(int))
 )

output:

     name  number  match
0   apple     102      1
1   apple     104      0
2  banana     103      0
3  banana     105      1

CodePudding user response：

for i in range(len(df)):
    if df.loc[i, 'name'] in df_2.name.values and df.loc[i, 'number'] in df_2.number.values:
       df.loc[i, 'match'] = 1
    else:
        df.loc[i, 'match'] = 0