Hi i have dataset something like this
dx = pd.DataFrame({'IDs':[1234,5346,1234,8793,8793],
'Names':['APPLE ABCD ONE','APPLE ABCD','NO STRAWBERRY YES','ORANGE AVAILABLE','TEA AVAILABLE']})
kw = ['APPLE', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY', 'TEA COFFEE']
dx['Check']=dx['Names'].apply(lambda x: 1 if any(k in x for k in kw) else 0)
instead of returning to 1 or 0 i want it to return to kw like 'APPLE', 'ORANGE' or 'TEA COFFE' in new column
hope anyone can help me
Thank you
CodePudding user response:
Use a regex with str.extract
to benefit from vectorial speed:
import re
regex = '|'.join(map(re.escape, kw))
dx['Check'] = dx['Names'].str.extract(f'({regex})')
NB. this only returns the first match, if you want all use extractall
and perform an aggregation step.
output:
IDs Names Check
0 1234 APPLE ABCD ONE APPLE
1 5346 APPLE ABCD APPLE
2 1234 NO STRAWBERRY YES STRAWBERRY
3 8793 ORANGE AVAILABLE ORANGE
4 8793 TEA AVAILABLE NaN
CodePudding user response:
would this work?
dx['Check']=dx['Names'].apply(lambda x: [k for k in kw if k in x ])