I have a dataframe and a list of words. Now I want to count how often all words in the list occur in each cell of a dataframe.
text |
---|
this is a test sentence |
another sentence |
list = ["this", "test", "break"]
Result:
text | occurence_count |
---|---|
this is a test sentence | 2 |
another sentence | 0 |
My code does not work:
df["occurence_count"] = [df["text"].count(x) for x in list]
CodePudding user response:
Perhaps you can do this:
a = ['this', 'test', 'break'] # 'list' shouldn't be used as a variable name
df['occurence_count'] = (
df['text'].str.split().explode()
.isin(set(a)).groupby(level=0).sum()
)
>>> df
text occurence_count
0 this is a test sentence 2
1 another sentence 0
CodePudding user response:
You can do :
import re
l = ['this', 'test', 'break']
s = set(l)
df['occurence_count'] =df['text'].apply(
lambda x:len(set(re.split('\s ',x)).intersection(s)))
So you split them into words, get a set and look for an intersection in your list and get len
(BTW don't use list
as variable name, its a keyword in python)
output:
text occurence_count
0 this is a test sentence 2
1 another sentence 0