search word in list all pandas columns-CodePudding

Below is my DF

df = pd.DataFrame({'a' : ['NYC', 'NYC', 'Boston', 'LA', 'SF', 'NYC'], 'b' : ['Other', 'Other', 'NY', 'NUI', 'SD', 'SF']})

    a   b
0   NYC Other
1   NYC Other
2   Boston  NY
3   LA  NUI
4   SF  SD
5   NYC SF

The aim is to check if list of words is in the df

Below is the code to check for a specific word

word = 'SF'
mask = np.column_stack([df[col].str.contains(word, na=False) for col in df])
df.loc[mask.any(axis=1)]


a   b
4   SF  SD
5   NYC SF

How can this be performed with list and not one string ?

word = ['SF', 'NY']

CodePudding user response：

You could form a regex alternation from the terms and use that:

word = ['SF', 'NY']
regex = r'^(?:'   r'|'.join(word)   ')$'
mask = np.column_stack([df[col].str.contains(regex, na=False) for col in df])
df.loc[mask.any(axis=1)]

CodePudding user response：

If you want to match exact words, use isin combined with any:

word = ['SF', 'NY']

df[df.isin(word).any(1)]

output:

        a   b
2  Boston  NY
4      SF  SD
5     NYC  SF

intermediates:

df.isin(word)

       a      b
0  False  False
1  False  False
2  False   True
3  False  False
4   True  False
5  False   True

df.isin(word).any(1)

0    False
1    False
2     True
3    False
4     True
5     True
dtype: bool

For a regex match combine apply and str.contains:

word = ['SF', 'NY']
regex = '|'.join(word)
df[df.apply(lambda c: c.str.contains(regex)).any(1)]

output:

        a      b
0     NYC  Other
1     NYC  Other
2  Boston     NY
4      SF     SD
5     NYC     SF