Pandas - grouping by multiple columns and sorting with one of them-CodePudding

My data consists of ridership data for a fictional bike sharing company.

My data is contained in a dataframe df
member_type has two values - member or casual
day_of_week has seven values - each day of the week
ride_duration is how long a ride lasts for one session

I want to compare the average ride_duration for each member_type depending on day of the week.

df.groupby(['member_type', 'day_of_week'])['ride_duration'].mean()

This code gives me what I want except that the day_of_week is in alphabetical order and not chronological.

I tried the code below, but I think that since I'm grouping by two columns the reindex doesn't work as I hoped to.

cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean().reindex(cats)

Additionally, if I were to sort by both member_type and day_of_week, how would I proceed? In this example I only have two member types, but say if I had 10 and wanted to sort them as well. Thanks in advance!

CodePudding user response：

You can try reindex with level parameter:

df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean().reindex(cats, level=1)

However, I would suggest changing the column day_of_week to CategoricalType before doing groupby so that you don't have to reindex later. Similarly you can convert the member_type column to ordered categorical type:

df['day_of_week'] = pd.Categorical(df['day_of_week'], cats, ordered=True)
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean()