Home > Enterprise >  How to run a cumulative/rolling average with a pandas groupby?
How to run a cumulative/rolling average with a pandas groupby?

Time:04-21

I have some ordered data that I want to .groupby() and obtain a cumulative/rolling average for each group. Here is an example of my data:

    player_id   gp  time    shot_zone_basic shot_made_flag
0   1630462 1   1616    Mid-Range   0
1   1630462 1   1851    Above the Break 3   0
2   1630462 1   1982    Restricted Area 0
3   1630462 1   2024    Right Corner 3  0
4   1630462 2   512 Restricted Area 0
5   1630462 2   714 Above the Break 3   0
6   1630462 2   1121    Restricted Area 0
7   1630462 3   354 Above the Break 3   1
8   1630462 3   401 Restricted Area 1
9   1630462 3   467 Restricted Area 1
10  1630462 3   564 Mid-Range   0
11  1630462 3   1116    Right Corner 3  0
12  1630462 3   1125    Right Corner 3  0
13  1630462 3   1194    Above the Break 3   0
14  1630462 3   1683    Above the Break 3   0
15  1630462 4   527 In The Paint (Non-RA)   0
16  1630462 4   683 Above the Break 3   1
17  1630462 4   779 Right Corner 3  0
18  1630462 4   1998    Restricted Area 1
19  1630462 5   898 Above the Break 3   1

Here is a dict of it to load:

{'gp': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 3, 8: 3, 9: 3, 10: 3, 
        11: 3, 12: 3, 13: 3, 14: 3, 15: 4, 16: 4, 17: 4, 18: 4, 19: 5}, 
 'player_id': {0: 1630462, 1: 1630462, 2: 1630462, 3: 1630462, 4: 1630462, 
               5: 1630462, 6: 1630462, 7: 1630462, 8: 1630462, 9: 1630462, 
               10: 1630462, 11: 1630462, 12: 1630462, 13: 1630462, 14: 1630462, 
               15: 1630462, 16: 1630462, 17: 1630462, 18: 1630462, 19: 1630462}, 
 'shot_made_flag': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1, 
                    10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 1, 17: 0, 18: 1, 19: 1}, 
 'shot_zone_basic': {0: 'Mid-Range', 1: 'Above the Break 3', 2: 'Restricted Area', 
                     3: 'Right Corner 3', 4: 'Restricted Area', 5: 'Above the Break 3', 
                     6: 'Restricted Area', 7: 'Above the Break 3', 8: 'Restricted Area', 
                     9: 'Restricted Area', 10: 'Mid-Range', 11: 'Right Corner 3', 
                     12: 'Right Corner 3', 13: 'Above the Break 3', 14: 'Above the Break 3', 
                     15: 'In The Paint (Non-RA)', 16: 'Above the Break 3', 17: 'Right Corner 3', 
                     18: 'Restricted Area', 19: 'Above the Break 3'}, 
 'time': {0: 1616, 1: 1851, 2: 1982, 3: 2024, 4: 512, 5: 714, 6: 1121, 7: 354, 8: 401, 
          9: 467, 10: 564, 11: 1116, 12: 1125, 13: 1194, 14: 1683, 15: 527, 16: 683, 17: 
          779, 18: 1998, 19: 898}}

With the data in this order (ordered by player player_id, games played gp and time elapsed since the start of that respective game time), I'd like to group by player_id (there's more than 1 player_id in my full data), shot_zone_basic, and shot_made_flag to calculate a cumulative or rolling average of shot_made_flag for each shot_zone_basic for each player_id

My desired output for a cumulative average would thus be

    player_id   gp  time    shot_zone_basic  shot_made_flag     shot_zone_avg
0   1630462     1   1616    Mid-Range                     0                 0
1   1630462     1   1851    Above the Break 3             0                 0
2   1630462     1   1982    Restricted Area               0                 0
3   1630462     1   2024    Right Corner 3                0                 0
4   1630462     2   512     Restricted Area               0                 0
5   1630462     2   714     Above the Break 3             0                 0
6   1630462     2   1121    Restricted Area               0                 0
7   1630462     3   354     Above the Break 3             1               0.5
8   1630462     3   401     Restricted Area               1              0.33
9   1630462     3   467     Restricted Area               1               0.5                                
10  1630462     3   564     Mid-Range                     0                 0
11  1630462     3   1116    Right Corner 3                0                 0
12  1630462     3   1125    Right Corner 3                0                 0
13  1630462     3   1194    Above the Break 3             0              0.33
14  1630462     3   1683    Above the Break 3             0              0.25
15  1630462     4   527     In The Paint (Non-RA)         0                 0
16  1630462     4   683     Above the Break 3             1              0.20
17  1630462     4   779     Right Corner 3                0                 0
18  1630462     4   1998    Restricted Area               1              0.40
19  1630462     5   898     Above the Break 3             1              0.16

Again, I'd like to calculate a cumulative average for each shot_zone_basic for each player in my data. How would you also do this for a rolling average?

CodePudding user response:

IIUC, you could use groupby expanding.mean:

df['shot_zone_avg_of_until_current'] = df.groupby(['player_id', 'shot_zone_basic'])['shot_made_flag'].expanding().mean().droplevel([0,1]).round(2)
df['shot_zone_avg_of_until_current'] = df.groupby(['player_id', 'shot_zone_basic'])['shot_made_flag'].expanding().mean().groupby(level=[0,1]).shift().droplevel([0,1]).round(2)

Output:

    gp  player_id  shot_made_flag        shot_zone_basic  time  shot_zone_avg_of_until_current
0    1    1630462               0              Mid-Range  1616                             NaN
1    1    1630462               0      Above the Break 3  1851                             NaN
2    1    1630462               0        Restricted Area  1982                             NaN
3    1    1630462               0         Right Corner 3  2024                             NaN
4    2    1630462               0        Restricted Area   512                            0.00  
5    2    1630462               0      Above the Break 3   714                            0.00
6    2    1630462               0        Restricted Area  1121                            0.00
7    3    1630462               1      Above the Break 3   354                            0.00
8    3    1630462               1        Restricted Area   401                            0.00
9    3    1630462               1        Restricted Area   467                            0.25
10   3    1630462               0              Mid-Range   564                            0.00
11   3    1630462               0         Right Corner 3  1116                            0.00
12   3    1630462               0         Right Corner 3  1125                            0.00
13   3    1630462               0      Above the Break 3  1194                            0.33
14   3    1630462               0      Above the Break 3  1683                            0.25 
15   4    1630462               0  In The Paint (Non-RA)   527                             NaN
16   4    1630462               1      Above the Break 3   683                            0.20
17   4    1630462               0         Right Corner 3   779                            0.00
18   4    1630462               1        Restricted Area  1998                            0.40
19   5    1630462               1      Above the Break 3   898                            0.33
  • Related