I have a dataframe that shows each audience's ranking for a bunch of movies. I wanted to make a list of movies with the most ratings, for each gender. Here's what I did:
most_rated_gen=lens.groupby(['sex','title']).size().sort_values(ascending=False).to_frame()
I was expecting to see a dataframe that looks something like this:
sex | title
M A
B
C
D
F B
C
D
A
I don't know why it shows M F M F M. Any ideas how I could fix this?
CodePudding user response:
As you group by sex
, the output will contain the sex
column.
You have a shortcut for your operation with value_counts
:
df.value_counts(['sex', 'title']).sort_index(kind='mergesort')
If you want your data to be sorted by index while preserving the order of values then you have to use sort_index
with kind='mergesort'
as parameter.
CodePudding user response:
You can use nlargest()
if your aggregated
column has a name. Assuming the column name is ratings_count
. You can use this code.
most_rated_gen.groupby(['sex'])['ratings_count'].nlargest()