Count occurences of list items in dataframe-CodePudding

I have a dataframe and a list of words. Now I want to count how often all words in the list occur in each cell of a dataframe.

text
this is a test sentence
another sentence

list = ["this", "test", "break"]

Result:

text	occurence_count
this is a test sentence	2
another sentence	0

My code does not work:

df["occurence_count"] = [df["text"].count(x) for x in list]

CodePudding user response：

Perhaps you can do this:

a = ['this', 'test', 'break']  # 'list' shouldn't be used as a variable name

df['occurence_count'] = (
    df['text'].str.split().explode()
    .isin(set(a)).groupby(level=0).sum()
)
>>> df
                      text  occurence_count
0  this is a test sentence                2
1         another sentence                0

CodePudding user response：

You can do :

import re
l = ['this', 'test', 'break']
s = set(l)
df['occurence_count'] =df['text'].apply(
            lambda x:len(set(re.split('\s ',x)).intersection(s)))

So you split them into words, get a set and look for an intersection in your list and get len

(BTW don't use list as variable name, its a keyword in python)

output:

                      text  occurence_count
0  this is a test sentence                2
1         another sentence                0