I have some ordered data that I want to .groupby()
and obtain a cumulative/rolling average for each group. Here is an example of my data:
player_id gp time shot_zone_basic shot_made_flag
0 1630462 1 1616 Mid-Range 0
1 1630462 1 1851 Above the Break 3 0
2 1630462 1 1982 Restricted Area 0
3 1630462 1 2024 Right Corner 3 0
4 1630462 2 512 Restricted Area 0
5 1630462 2 714 Above the Break 3 0
6 1630462 2 1121 Restricted Area 0
7 1630462 3 354 Above the Break 3 1
8 1630462 3 401 Restricted Area 1
9 1630462 3 467 Restricted Area 1
10 1630462 3 564 Mid-Range 0
11 1630462 3 1116 Right Corner 3 0
12 1630462 3 1125 Right Corner 3 0
13 1630462 3 1194 Above the Break 3 0
14 1630462 3 1683 Above the Break 3 0
15 1630462 4 527 In The Paint (Non-RA) 0
16 1630462 4 683 Above the Break 3 1
17 1630462 4 779 Right Corner 3 0
18 1630462 4 1998 Restricted Area 1
19 1630462 5 898 Above the Break 3 1
Here is a dict of it to load:
{'gp': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 3, 8: 3, 9: 3, 10: 3,
11: 3, 12: 3, 13: 3, 14: 3, 15: 4, 16: 4, 17: 4, 18: 4, 19: 5},
'player_id': {0: 1630462, 1: 1630462, 2: 1630462, 3: 1630462, 4: 1630462,
5: 1630462, 6: 1630462, 7: 1630462, 8: 1630462, 9: 1630462,
10: 1630462, 11: 1630462, 12: 1630462, 13: 1630462, 14: 1630462,
15: 1630462, 16: 1630462, 17: 1630462, 18: 1630462, 19: 1630462},
'shot_made_flag': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1,
10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 1, 17: 0, 18: 1, 19: 1},
'shot_zone_basic': {0: 'Mid-Range', 1: 'Above the Break 3', 2: 'Restricted Area',
3: 'Right Corner 3', 4: 'Restricted Area', 5: 'Above the Break 3',
6: 'Restricted Area', 7: 'Above the Break 3', 8: 'Restricted Area',
9: 'Restricted Area', 10: 'Mid-Range', 11: 'Right Corner 3',
12: 'Right Corner 3', 13: 'Above the Break 3', 14: 'Above the Break 3',
15: 'In The Paint (Non-RA)', 16: 'Above the Break 3', 17: 'Right Corner 3',
18: 'Restricted Area', 19: 'Above the Break 3'},
'time': {0: 1616, 1: 1851, 2: 1982, 3: 2024, 4: 512, 5: 714, 6: 1121, 7: 354, 8: 401,
9: 467, 10: 564, 11: 1116, 12: 1125, 13: 1194, 14: 1683, 15: 527, 16: 683, 17:
779, 18: 1998, 19: 898}}
With the data in this order (ordered by player player_id
, games played gp
and time elapsed since the start of that respective game time
), I'd like to group by player_id
(there's more than 1 player_id in my full data), shot_zone_basic
, and shot_made_flag
to calculate a cumulative or rolling average of shot_made_flag
for each shot_zone_basic
for each player_id
My desired output for a cumulative average would thus be
player_id gp time shot_zone_basic shot_made_flag shot_zone_avg
0 1630462 1 1616 Mid-Range 0 0
1 1630462 1 1851 Above the Break 3 0 0
2 1630462 1 1982 Restricted Area 0 0
3 1630462 1 2024 Right Corner 3 0 0
4 1630462 2 512 Restricted Area 0 0
5 1630462 2 714 Above the Break 3 0 0
6 1630462 2 1121 Restricted Area 0 0
7 1630462 3 354 Above the Break 3 1 0.5
8 1630462 3 401 Restricted Area 1 0.33
9 1630462 3 467 Restricted Area 1 0.5
10 1630462 3 564 Mid-Range 0 0
11 1630462 3 1116 Right Corner 3 0 0
12 1630462 3 1125 Right Corner 3 0 0
13 1630462 3 1194 Above the Break 3 0 0.33
14 1630462 3 1683 Above the Break 3 0 0.25
15 1630462 4 527 In The Paint (Non-RA) 0 0
16 1630462 4 683 Above the Break 3 1 0.20
17 1630462 4 779 Right Corner 3 0 0
18 1630462 4 1998 Restricted Area 1 0.40
19 1630462 5 898 Above the Break 3 1 0.16
Again, I'd like to calculate a cumulative average for each shot_zone_basic
for each player in my data. How would you also do this for a rolling average?
CodePudding user response:
IIUC, you could use groupby
expanding.mean
:
df['shot_zone_avg_of_until_current'] = df.groupby(['player_id', 'shot_zone_basic'])['shot_made_flag'].expanding().mean().droplevel([0,1]).round(2)
df['shot_zone_avg_of_until_current'] = df.groupby(['player_id', 'shot_zone_basic'])['shot_made_flag'].expanding().mean().groupby(level=[0,1]).shift().droplevel([0,1]).round(2)
Output:
gp player_id shot_made_flag shot_zone_basic time shot_zone_avg_of_until_current
0 1 1630462 0 Mid-Range 1616 NaN
1 1 1630462 0 Above the Break 3 1851 NaN
2 1 1630462 0 Restricted Area 1982 NaN
3 1 1630462 0 Right Corner 3 2024 NaN
4 2 1630462 0 Restricted Area 512 0.00
5 2 1630462 0 Above the Break 3 714 0.00
6 2 1630462 0 Restricted Area 1121 0.00
7 3 1630462 1 Above the Break 3 354 0.00
8 3 1630462 1 Restricted Area 401 0.00
9 3 1630462 1 Restricted Area 467 0.25
10 3 1630462 0 Mid-Range 564 0.00
11 3 1630462 0 Right Corner 3 1116 0.00
12 3 1630462 0 Right Corner 3 1125 0.00
13 3 1630462 0 Above the Break 3 1194 0.33
14 3 1630462 0 Above the Break 3 1683 0.25
15 4 1630462 0 In The Paint (Non-RA) 527 NaN
16 4 1630462 1 Above the Break 3 683 0.20
17 4 1630462 0 Right Corner 3 779 0.00
18 4 1630462 1 Restricted Area 1998 0.40
19 5 1630462 1 Above the Break 3 898 0.33