this is a code i wrote, but the output is too big, over 6000, how do i get the first result for each year
df_year = df.groupby('release_year')['genres'].value_counts()
CodePudding user response:
Let's start from a small correction concerning variable name: value_counts returns a Series (not DataFrame), so you should not use name starting from df.
Assume that the variable holding this Series is gen.
Then, one of possible solutions is:
result = gen.groupby(level=0).apply(lambda grp:
grp.droplevel(0).sort_values(ascending=False).head(1))
Initially you wrote that you wanted the most popular genre in each year, so I sorted each group in descending order and returned the first row from the current group.