Home > Net >  Numpy isin() is not returning expected result
Numpy isin() is not returning expected result

Time:02-22

Based on the code below, I would expect the first element of the 'duplicate' column to return 'True' since it exists in 'df_set'. This is for a much larger data-set, hence the use of converting to a set...

What am I doing incorrectly that is causing the first element of 'duplicate' to return 'False?

import numpy as np
import pandas as pd

data = [
    ['tom', 'juli'],
    ['nick', 'heather'],
    ['juli', 'john'],
    ['dustin', 'tracy']
]
columns = ['Name', 'Name2']

df = pd.DataFrame(data, columns = columns)
df_set = set(df['Name'])
df['duplicate'] = np.isin(df['Name2'], df_set, assume_unique=True)
print(df)

Output:

     Name    Name2  duplicate
0     tom     juli      False
1    nick  heather      False
2    juli     john      False
3  dustin    tracy      False

CodePudding user response:

numpy doesn't seem to like sets, so you should convert the set back to a list:

df['duplicate'] = np.isin(df['Name2'], list(df_set), assume_unique=True)

Output:

>>> df
     Name    Name2  duplicate
0     tom     juli       True
1    nick  heather      False
2    juli     john      False
3  dustin    tracy      False

CodePudding user response:

Another way, could still evaluate within df;

df['duplicate'] =df['Name2'].isin(set(df['Name']))



     Name    Name2  duplicate
0     tom     juli       True
1    nick  heather      False
2    juli     john      False
3  dustin    tracy      False
  • Related