Group by week and rank to pick top 5-CodePudding

I have a DataFrame with one month's data:

initiated_date      | specialist_id  |  rating
21/10/2020 05:00:01 |   ab12         |    8.1
21/10/2020 12:20:01 |   gc35         |    7.3         
22/10/2020 04:30:01 |   ad32         |    6.4
22/10/2020 03:40:01 |   fe45         |    9.2
22/10/2020 01:50:01 |   ad32         |    8.2
23/10/2020 02:10:01 |   iu99         |    8.7
23/10/2020 11:30:01 |   iu99         |    7.9

I want to breakdown per week for the top 5 specialists with respect to average rating. I am trying to replicate the rank partition by function of sql here.

What I am looking for is:

Week  |  specialist_id |  avg.rating   |  ranking 
22    |   ab12         |    9.8        |    1
22    |   gb22         |    9.1        |    2
22    |   po90         |    8.8        |    3
22    |   nh78         |    8.3        |    4
22    |   hj89         |    8.1        |    5
23    |   bg77         |    9.7        |    1
23    |   ab12         |    9.5        |    2

This is what I am trying to do

df.groupby([df['initiated_date'].dt.week])['rating'].mean().reset_index()

I am not sure how to go further.

CodePudding user response：

Let us do in steps:

groupby week and specialist_id and calculate avg_rating
groupby week and assign the numerical rank for avg_rating per week
(Optional) sort the values by week and rank

w = df.initiated_date.dt.isocalendar().week
df1 = df.groupby([w, 'specialist_id'])['rating'].mean().reset_index(name='avg_rating')
df1['rank'] = df1.groupby('week', as_index=False)['avg_rating'].rank(method='dense')
df1 = df1.sort_values(['week', 'rank'])

print(df1)
   week specialist_id  avg_rating  rank
2    43          fe45         9.2   1.0
4    43          iu99         8.3   2.0
0    43          ab12         8.1   3.0
1    43          ad32         7.3   4.0
...

CodePudding user response：

You need a double groupby:

(df
 .assign(week=pd.to_datetime(df['initiated_date']).dt.isocalendar().week)
 .groupby(['week', 'specialist_id'])
 .agg(avg_rating=('rating', 'mean'))
 .assign(rank=lambda d: d.groupby('week')['avg_rating'].rank(method='dense', ascending=False))
 .query('rank <= 5')
)