pandas count occurrence for list and column-CodePudding

I am trying to count the occurrence of an element from a list within a dataframe column,

for example:

xlst = ['pak', 'vector', 'word', 'po']


df:

col A, col B, col C

pk-121  abc   pak is going great
pk-112  xyz   word is word my friend
pk-132  agh   vector needs working
pk-321  jkl   pak is winning
pk-333  yul   vector now

Desired df:

word  count
pak   2
word  1
vector 2

CodePudding user response：

You can use a regex to match the words, then drop_duplicates and value_counts:

import re

out = (df['col C']
 .str.extractall(f"(?P<word>{'|'.join(xlst)})")
 .droplevel('match').reset_index()
 .drop_duplicates()['word']
 .value_counts().reset_index(name='count')
)

Output:

    index  count
0     pak      2
1  vector      2
2    word      1

Alternative using str.get_dummies:

out = df['col C'].str.get_dummies(sep=' ').reindex(columns=xlst).sum()

Output:

pak       2.0
vector    2.0
word      1.0
po        0.0
dtype: float64