I have a dataframe like this am I'm trying to count the words said by a specific author.
Author Text Date
Jake hey hey my names Jake 1.04.1997
Mac hey my names Mac 1.02.2019
Sarah heymy names Sarah 5.07.2001
I've been trying to get it set up in a way where if i were to search for the word "hey" it would produce
Author Count
Jake 2
Mac 1
CodePudding user response:
Use Series.str.count
with aggregate sum
:
df1 = df['Text'].str.count('hey').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 0
2 Sarah 1
If need filter out rows with 0 values add boolean indexing
:
s = df['Text'].str.count('hey')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Sarah 1
EDIT: for test hey
separately add words boundaries \b\b
like:
df1 = df['Text'].str.count(r'\bhey\b').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1
2 Sarah 0
s = df['Text'].str.count(r'\bhey\b')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1
CodePudding user response:
If df
is your original dataframe
newDF = pd.DataFrame(columns=['Author','Count'])
newDF['Author'] = df['Author']
newDF['Count'] = df['Text'].str.count("hey")
newDF.drop(newDF[newDF['Count'] == 0].index, inplace=True)
CodePudding user response:
Try this:
import pandas as pd
df = pd.DataFrame([('Jake', 'hey hey my names Jake', '1.04.1997'),
('Mac', 'hi my names Mac', '1.02.2019'),
('Sarah', 'heymy names Sarah', '5.07.2001')],
columns=['Author', 'Text', 'Date'])
df['Count'] = df['Text'].str.count('hey')
df.loc[df['Count'] > 0, ['Author', 'Count']]