python: test for existence of any of multiple strings in a text field to set new column value-CodePudding

have been trying for a good while now and cannot find an answer online, so... I'm sure someone can help.

I have a dataframe with a column that contains descriptive text, e.g.
"BALANCE SHRINKER - CORE"
Each row has a different text value.

I need to check for the existence of any of multiple words:
['LOB','LIFE','SHRINKER'] say.
And from the result (True/False), create a new column set to 999 if any phrase is found in the text column being searched, or set to 0 otherwise.

I have tried this kind of approach but nothing works for me:
df['rule1'] = 999 if any(x in df['textcolumn'].str for x in ['LOB','LIFE','SHRINKER']) else 0

I've tried .find() and .contains() but to no avail.

So, I'm sure someone can advise!

Thanks for looking.
DT

CodePudding user response：

Use Series.str.contains to check if each row of 'textcolumn' contains any of the words, producing a boolean Series. Then use Series.map to map the True values to 900, and the False values to 0.

# list of words to find in 'textcolumn'
words = ['LOB','LIFE','SHRINKER']
# regex pattern to search in 'textcolumn'
# '|' stands for OR. Read pat as "match 'LOB' OR 'LIFE' OR 'SHRINKER'"
pat = "|".join(words)

df['rule1'] = df['textcolumn'].str.contains(pat).map({True: 999, False: 0})

Another option is to use numpy.where

import numpy as np

words = ['LOB','LIFE','SHRINKER']
pat = "|".join(words)

df['rule1'] = np.where(df['textcolumn'].str.contains(pat), 999, 0)