My data consists of ridership data for a fictional bike sharing company.
- My data is contained in a dataframe df
- member_type has two values - member or casual
- day_of_week has seven values - each day of the week
- ride_duration is how long a ride lasts for one session
I want to compare the average ride_duration for each member_type depending on day of the week.
df.groupby(['member_type', 'day_of_week'])['ride_duration'].mean()
This code gives me what I want except that the day_of_week is in alphabetical order and not chronological.
I tried the code below, but I think that since I'm grouping by two columns the reindex doesn't work as I hoped to.
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean().reindex(cats)
Additionally, if I were to sort by both member_type and day_of_week, how would I proceed? In this example I only have two member types, but say if I had 10 and wanted to sort them as well. Thanks in advance!
CodePudding user response:
You can try reindex
with level
parameter:
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean().reindex(cats, level=1)
However, I would suggest changing the column day_of_week
to CategoricalType
before doing groupby
so that you don't have to reindex
later. Similarly you can convert the member_type
column to ordered categorical type:
df['day_of_week'] = pd.Categorical(df['day_of_week'], cats, ordered=True)
df.groupby(['member_casual', 'day_of_week'])['ride_duration'].mean()