I have a DataFrame with one month's data:
initiated_date | specialist_id | rating
21/10/2020 05:00:01 | ab12 | 8.1
21/10/2020 12:20:01 | gc35 | 7.3
22/10/2020 04:30:01 | ad32 | 6.4
22/10/2020 03:40:01 | fe45 | 9.2
22/10/2020 01:50:01 | ad32 | 8.2
23/10/2020 02:10:01 | iu99 | 8.7
23/10/2020 11:30:01 | iu99 | 7.9
I want to breakdown per week for the top 5 specialists with respect to average rating. I am trying to replicate the rank partition by function of sql here.
What I am looking for is:
Week | specialist_id | avg.rating | ranking
22 | ab12 | 9.8 | 1
22 | gb22 | 9.1 | 2
22 | po90 | 8.8 | 3
22 | nh78 | 8.3 | 4
22 | hj89 | 8.1 | 5
23 | bg77 | 9.7 | 1
23 | ab12 | 9.5 | 2
This is what I am trying to do
df.groupby([df['initiated_date'].dt.week])['rating'].mean().reset_index()
I am not sure how to go further.
CodePudding user response:
Let us do in steps:
- groupby
week
andspecialist_id
and calculateavg_rating
- groupby
week
and assign the numerical rank foravg_rating
perweek
- (Optional) sort the values by
week
andrank
w = df.initiated_date.dt.isocalendar().week
df1 = df.groupby([w, 'specialist_id'])['rating'].mean().reset_index(name='avg_rating')
df1['rank'] = df1.groupby('week', as_index=False)['avg_rating'].rank(method='dense')
df1 = df1.sort_values(['week', 'rank'])
print(df1)
week specialist_id avg_rating rank
2 43 fe45 9.2 1.0
4 43 iu99 8.3 2.0
0 43 ab12 8.1 3.0
1 43 ad32 7.3 4.0
...
CodePudding user response:
You need a double groupby
:
(df
.assign(week=pd.to_datetime(df['initiated_date']).dt.isocalendar().week)
.groupby(['week', 'specialist_id'])
.agg(avg_rating=('rating', 'mean'))
.assign(rank=lambda d: d.groupby('week')['avg_rating'].rank(method='dense', ascending=False))
.query('rank <= 5')
)