Home > Blockchain >  how to select rows that contains one or more keywords(from an existing list) in pandas?
how to select rows that contains one or more keywords(from an existing list) in pandas?

Time:11-20

I'm trying to select every row of a pandas.DataFrame which df['Title'] has one(or more) of the keywords elements.

consider this list as keywords:
keywords = ['k_1', 'k_2', 'k_3', 'k_4']

I've tried this approach which did not worked out for me:
df[df['Title'].str.contains(keywords)]

CodePudding user response:

df[df["Title"].apply(lambda x: any(k in x for k in keywords))]

CodePudding user response:

Create a regex pattern and use str.findall:

Setup:

df = pd.DataFrame({'Title': ['k_1 and k_2', 'k_3 alone', 'k_z not here']})
keywords = ['k_1', 'k_2', 'k_3', 'k_4']
pattern = fr"\b({'|'.join(keywords)})\b"

df['Keywords'] = df['Title'].str.findall(pattern)

Output:

>>> df
          Title    Keywords
0   k_1 and k_2  [k_1, k_2]
1     k_3 alone       [k_3]
2  k_z not here          []

>>> print(pattern)
\b(k_1|k_2|k_3|k_4)\b

Get rows:

>>> df[df['Title'].str.findall(pattern).astype(bool)]
         Title
0  k_1 and k_2
1    k_3 alone
  • Related