I have a dataframe with columns :languages and words
df:
Parts of speech word
0 Noun cat
1 Noun water
2 Noun cat
3 verb draw
4 verb draw
5 adj slow
I want to group the top words by parts of speech(what i expect):
Parts of speech top
Noun {'cat':2,'water':1}
verb {'draw':2}
adj {'slow':1}
I do it using the method groupby and apply, but I don 't get what I need
df2=df.groupby('Parts of speech')['word'].apply(lambda x : x.value_counts())
How do I create a tuple for each parts of speech?
CodePudding user response:
One approach is to aggregate using .agg
collections.Counter
:
from collections import Counter
df2=df.groupby('Parts of speech')['word'].agg(Counter)
print(df2)
Output
Parts of speech
Noun {'cat': 2, 'water': 1}
adj {'slow': 1}
verb {'draw': 2}
Name: word, dtype: object
Alternative using value_counts
(notice the to_dict call at the end):
df2 = df.groupby('Parts of speech')['word'].agg(lambda x: x.value_counts().to_dict())