I have a "list_of_sendlists" with sample items like this:
list_of_sendlistsA = ["VIP","Officials","2021Customers","2020Customers"]
and I have a dataframe that contains some email addresses, and the "send lists" that are assigned to them: (Note: for d and e below, there are values in the lists that are not in "list_of_sendlists"
email listsB
[email protected] VIP,Officials
[email protected] Officials
[email protected]
[email protected] Non-factor
[email protected] Officials,Resigned
The tasks I want to manage is to add a column "on_list", if the email address has a value inside the "listB" that matches with "list_of_sendlistsA"
email listsB on_list
[email protected] VIP,Officials Yes
[email protected] Officials Yes
[email protected] No
[email protected] Non-factor No
[email protected] Officials,Resigned Yes
I tried df.listsB = df.listB.str.split(',')
to turn the "listB" column into a Python List,
and then df['onlist'] = df.apply(lambda k: any(x in df['sendlist'] for x in k), axis = 1)
in order to return a "True" value if there are intersections between the two lists for each email address, but I couldn't get what I want (all returned value of df['onlist']
were false)
May I know if someone could help me?
Thanks so much!
CodePudding user response:
Use set.disjoint
for False
if disjoint
values, so possible pass to numpy.where
:
list_of_sendlistsA = ["VIP","Officials","2021Customers","2020Customers"]
mask = df.listsB.fillna('').str.split(',').map(set(list_of_sendlistsA).isdisjoint)
df['onlist'] = np.where(mask,'No','Yes' )
print (df)
email listsB onlist
0 [email protected] VIP,Officials Yes
1 [email protected] Officials Yes
2 [email protected] NaN No
3 [email protected] Non-factor No
4 [email protected] Officials,Resigned Yes
Your solution should be change:
df['listsB'] = df.listsB.str.split(',')
mask = df['listsB'].fillna('').apply(lambda k: any(x in list_of_sendlistsA for x in k))
df['onlist'] = np.where(mask, 'Yes', 'No')