Home > Blockchain >  python: test for existence of any of multiple strings in a text field to set new column value
python: test for existence of any of multiple strings in a text field to set new column value

Time:11-15

have been trying for a good while now and cannot find an answer online, so... I'm sure someone can help.

I have a dataframe with a column that contains descriptive text, e.g.
"BALANCE SHRINKER - CORE"
Each row has a different text value.

I need to check for the existence of any of multiple words:
['LOB','LIFE','SHRINKER'] say.
And from the result (True/False), create a new column set to 999 if any phrase is found in the text column being searched, or set to 0 otherwise.

I have tried this kind of approach but nothing works for me:
df['rule1'] = 999 if any(x in df['textcolumn'].str for x in ['LOB','LIFE','SHRINKER']) else 0

I've tried .find() and .contains() but to no avail.

So, I'm sure someone can advise!

Thanks for looking.
DT

CodePudding user response:

Use Series.str.contains to check if each row of 'textcolumn' contains any of the words, producing a boolean Series. Then use Series.map to map the True values to 900, and the False values to 0.

# list of words to find in 'textcolumn'
words = ['LOB','LIFE','SHRINKER']
# regex pattern to search in 'textcolumn'
# '|' stands for OR. Read pat as "match 'LOB' OR 'LIFE' OR 'SHRINKER'"
pat = "|".join(words)

df['rule1'] = df['textcolumn'].str.contains(pat).map({True: 999, False: 0})

Another option is to use numpy.where

import numpy as np

words = ['LOB','LIFE','SHRINKER']
pat = "|".join(words)

df['rule1'] = np.where(df['textcolumn'].str.contains(pat), 999, 0) 
  • Related