I'm trying to select every row of a pandas.DataFrame
which df['Title']
has one(or more) of the keywords
elements.
consider this list as keywords:
keywords = ['k_1', 'k_2', 'k_3', 'k_4']
I've tried this approach which did not worked out for me:
df[df['Title'].str.contains(keywords)]
CodePudding user response:
df[df["Title"].apply(lambda x: any(k in x for k in keywords))]
CodePudding user response:
Create a regex pattern and use str.findall
:
Setup:
df = pd.DataFrame({'Title': ['k_1 and k_2', 'k_3 alone', 'k_z not here']})
keywords = ['k_1', 'k_2', 'k_3', 'k_4']
pattern = fr"\b({'|'.join(keywords)})\b"
df['Keywords'] = df['Title'].str.findall(pattern)
Output:
>>> df
Title Keywords
0 k_1 and k_2 [k_1, k_2]
1 k_3 alone [k_3]
2 k_z not here []
>>> print(pattern)
\b(k_1|k_2|k_3|k_4)\b
Get rows:
>>> df[df['Title'].str.findall(pattern).astype(bool)]
Title
0 k_1 and k_2
1 k_3 alone