Home > Software engineering >  Grouping and selecting the n most repeated values in each group
Grouping and selecting the n most repeated values in each group

Time:01-16

I have data for a TV game show where I have the corresponding Rounds and question categories in those rounds. I grouped the questions by round and category with the following code:

data.groupby(['Round']).Category.value_counts()
data.groupby(['Round']).Category.value_counts().head(n)

When I do the function head(n) it only shows me n observations from the first group, and I would like to get the n most repeated categories in each group

How can I find a solution to this problem.

CodePudding user response:

Reversed: you can count the values first, then take the top N per "Round":

df[["Round", "Category"]].value_counts().groupby(level="Round").head(n)

CodePudding user response:

You can use groupby.apply here:

data.groupby('Round')['Category'].apply(lambda g: g.value_counts().head(n))
  • Related