Home > database >  Error applying a weighted rolling average by group in Python
Error applying a weighted rolling average by group in Python

Time:05-03

I have the following dataframe for which I'm trying to compute a weighted rolling average:

import pandas as pd
df = pd.DataFrame({'player_ID': {0: 123,
  1: 123,
  2: 123,
  3: 123,
  4: 123,
  5: 456,
  6: 456,
  7: 456,
  8: 456,
  9: 456},
 'hole_sg': {0: 3.14,
  1: 2.70,
  2: 5.20,
  3: -0.02,
  4: 2.09,
  5: -2.92,
  6: -2.01,
  7: 3.02,
  8: -0.72,
  9: -1.77}})

This array indicates the weights that I want to apply to the rolling averages:

import numpy as np
weights = np.array([0.66342043, 0.6983373 , 0.73509189])

The code that I used to calculate weighted averages (shown below) works when I don't group, but I want to be able to group by 'player_ID'

df['rolling'] = df.groupby('player_ID')['hole_sg'].rolling(3).apply(lambda x: (np.dot(x, weights)/weights).sum())

When I try to group I get the following error:

TypeError: incompatible index of inserted column with frame index

Does anyone have any suggestions on how to fix this? If everything worked properly, this is the result I would get:

enter image description here

CodePudding user response:

When you groupby and use rolling you get a MultiIndex. To align with the original DataFrame, you can use:

df["rolling"] = df.groupby('player_ID')['hole_sg'].rolling(3).apply(lambda x: (np.dot(x, weights))/weights.sum()).droplevel(0)

>>> df
   player_ID  hole_sg   rolling
0        123     3.14       NaN
1        123     2.70       NaN
2        123     5.20  3.715635
3        123    -0.02  2.579053
4        123     2.09  2.371253
5        456    -2.92       NaN
6        456    -2.01       NaN
7        456     3.02 -0.534549
8        456    -0.72  0.117432
9        456    -1.77  0.095197
  • Related