How to count the amount of words said by someone pandas dataframe-CodePudding

I have a dataframe like this am I'm trying to count the words said by a specific author.

Author              Text                   Date
Jake                hey hey my names Jake  1.04.1997
Mac                 hey my names Mac        1.02.2019
Sarah               heymy names Sarah      5.07.2001

I've been trying to get it set up in a way where if i were to search for the word "hey" it would produce

Author              Count
Jake                2
Mac                 1

CodePudding user response：

Use Series.str.count with aggregate sum:

df1 = df['Text'].str.count('hey').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1    Mac      0
2  Sarah      1

If need filter out rows with 0 values add boolean indexing:

s = df['Text'].str.count('hey')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1  Sarah      1

EDIT: for test hey separately add words boundaries \b\b like:

df1 = df['Text'].str.count(r'\bhey\b').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1    Mac      1
2  Sarah      0


s = df['Text'].str.count(r'\bhey\b')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1    Mac      1

CodePudding user response：

If df is your original dataframe

newDF = pd.DataFrame(columns=['Author','Count'])
newDF['Author'] = df['Author']
newDF['Count'] = df['Text'].str.count("hey")
newDF.drop(newDF[newDF['Count'] == 0].index, inplace=True)

CodePudding user response：

Try this:

import pandas as pd

df = pd.DataFrame([('Jake', 'hey hey my names Jake', '1.04.1997'),
                   ('Mac', 'hi my names Mac', '1.02.2019'),
                   ('Sarah', 'heymy names Sarah', '5.07.2001')],
                  columns=['Author', 'Text', 'Date'])

df['Count'] = df['Text'].str.count('hey')
df.loc[df['Count'] > 0, ['Author', 'Count']]