Home > Blockchain >  Add new Pandas Column based on whether list items in a column are inside a list
Add new Pandas Column based on whether list items in a column are inside a list

Time:11-20

I have a "list_of_sendlists" with sample items like this:

list_of_sendlistsA = ["VIP","Officials","2021Customers","2020Customers"]

and I have a dataframe that contains some email addresses, and the "send lists" that are assigned to them: (Note: for d and e below, there are values in the lists that are not in "list_of_sendlists"

email   listsB
[email protected] VIP,Officials
[email protected] Officials
[email protected] 
[email protected] Non-factor
[email protected] Officials,Resigned

The tasks I want to manage is to add a column "on_list", if the email address has a value inside the "listB" that matches with "list_of_sendlistsA"

email   listsB              on_list
[email protected] VIP,Officials       Yes
[email protected] Officials           Yes
[email protected]                     No
[email protected] Non-factor          No
[email protected] Officials,Resigned  Yes

I tried df.listsB = df.listB.str.split(',') to turn the "listB" column into a Python List,

and then df['onlist'] = df.apply(lambda k: any(x in df['sendlist'] for x in k), axis = 1)

in order to return a "True" value if there are intersections between the two lists for each email address, but I couldn't get what I want (all returned value of df['onlist'] were false)

May I know if someone could help me?

Thanks so much!

CodePudding user response:

Use set.disjoint for False if disjoint values, so possible pass to numpy.where:

list_of_sendlistsA = ["VIP","Officials","2021Customers","2020Customers"]
mask = df.listsB.fillna('').str.split(',').map(set(list_of_sendlistsA).isdisjoint)
df['onlist'] = np.where(mask,'No','Yes' )

print (df)
     email              listsB onlist
0  [email protected]       VIP,Officials    Yes
1  [email protected]           Officials    Yes
2  [email protected]                 NaN     No
3  [email protected]          Non-factor     No
4  [email protected]  Officials,Resigned    Yes

Your solution should be change:

df['listsB'] = df.listsB.str.split(',')
mask = df['listsB'].fillna('').apply(lambda k: any(x in list_of_sendlistsA for x in k))
df['onlist'] = np.where(mask, 'Yes', 'No')
  • Related