Home > Enterprise >  Count strings in Series Python
Count strings in Series Python

Time:02-12

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'X': ['Ciao, I would like to count the number of occurrences in this text considering negations that can change the meaning of the sentence',
                    "Hello, not number of negations, in this case we need to take care of the negation.",
                    "Hello world, don't number is another case in which where we need to consider negations."]})

I would like to count how many times a string appears in those senteces. So I simply do:

d = pd.DataFrame(['need'], columns = ['D'])
df['X'].str.count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))

0    1
1    2
2    2
Name: X, dtype: int64

However, in the application I am doing, I need to loop over each element of df which means:

res=[]
for i in range(len(df)):
    f = df['X'][i].count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))
    res.append(f)

[0,0,0]

I get two different results. The first one is obviously correct.

How can I fix it?

Thanks!

CodePudding user response:

Use iterrows:

import re

words = ['need', 'number']

res = {}
for idx, row in df.iterrows():
    count = len(re.findall('|'.join(words), row['X']))
    res[idx] = count
df['count'] = pd.Series(res)

Output:

>>> df
                                                   X  count
0  Ciao, I would like to count the number of occu...      1
1  Hello, not number of negations, in this case w...      2
2  Hello world, don't number is another case in w...      2

CodePudding user response:

I think the fastest way is to use another function to count number of occurrences in a regex. You can try something like that:

import re
res=[]
for i in range(len(df)):
    pattern = '|'.join(d.append({'D': 'number'}, ignore_index = True).D)
    text = df['X'][I]
    count = len(re.findall(pattern, text))
    res.append(count)
  • Related