Home > Back-end >  Pandas - create a tuple of the most frequent words using groupby
Pandas - create a tuple of the most frequent words using groupby

Time:10-27

I have a dataframe with columns :languages and words

df:
      Parts of speech  word
    0 Noun             cat
    1 Noun             water
    2 Noun             cat
    3 verb             draw
    4 verb             draw
    5 adj              slow

I want to group the top words by parts of speech(what i expect):

Parts of speech     top 
Noun             {'cat':2,'water':1}
verb             {'draw':2}
adj              {'slow':1}

I do it using the method groupby and apply, but I don 't get what I need

df2=df.groupby('Parts of speech')['word'].apply(lambda x : x.value_counts())

How do I create a tuple for each parts of speech?

CodePudding user response:

One approach is to aggregate using .agg collections.Counter:

from collections import Counter
df2=df.groupby('Parts of speech')['word'].agg(Counter)
print(df2)

Output

Parts of speech
Noun    {'cat': 2, 'water': 1}
adj                {'slow': 1}
verb               {'draw': 2}
Name: word, dtype: object

Alternative using value_counts (notice the to_dict call at the end):

df2 = df.groupby('Parts of speech')['word'].agg(lambda x: x.value_counts().to_dict())
  • Related