Home > Software design >  How to find substring in list and return to substring in list instead of true or false only
How to find substring in list and return to substring in list instead of true or false only

Time:09-27

Hi i have dataset something like this

dx = pd.DataFrame({'IDs':[1234,5346,1234,8793,8793],
                    'Names':['APPLE ABCD ONE','APPLE ABCD','NO STRAWBERRY YES','ORANGE AVAILABLE','TEA AVAILABLE']})

kw = ['APPLE', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY', 'TEA COFFEE']
dx['Check']=dx['Names'].apply(lambda x: 1 if any(k in x for k in kw) else 0)

instead of returning to 1 or 0 i want it to return to kw like 'APPLE', 'ORANGE' or 'TEA COFFE' in new column

hope anyone can help me

Thank you

CodePudding user response:

Use a regex with str.extract to benefit from vectorial speed:

import re

regex = '|'.join(map(re.escape, kw))
dx['Check'] = dx['Names'].str.extract(f'({regex})')

NB. this only returns the first match, if you want all use extractall and perform an aggregation step.

output:

    IDs              Names       Check
0  1234     APPLE ABCD ONE       APPLE
1  5346         APPLE ABCD       APPLE
2  1234  NO STRAWBERRY YES  STRAWBERRY
3  8793   ORANGE AVAILABLE      ORANGE
4  8793      TEA AVAILABLE         NaN

CodePudding user response:

would this work?

dx['Check']=dx['Names'].apply(lambda x: [k for k in kw if k in x ])
  • Related