I have the following dataframe:
HorseId | FGrating |
---|---|
25620 | 112 |
25620 | 108 |
25620 | 111 |
25620 | 98 |
25620 | 103 |
25620 | 96 |
I want to compute the rolling average of the FGrating
column according to these rules:
- If I have 4 or more races of a certain
HorseId
, then I compute the rolling average for the last 4 races; - If I have less than 4 races, then I compute the rolling average for the races that I have so far.
The average should be displayed on the following row of the rows used to compute the aforementioned metric. As an example: the rolling average for the first 4 races should be displayed on the 5th row. So, on the first row the correct value is 0
What I am looking for is a table like this:
HorseId | FGrating | Rolling average |
---|---|---|
25620 | 112 | 0 |
25620 | 108 | 112 (112/1) |
25620 | 111 | 110 (112 108)/2 |
25620 | 98 | 110.33 (112 108 111)/3 |
25620 | 103 | 107.25 (112 108 111 98)/4 |
25620 | 96 | 105 (108 111 98 103)/4 |
So far, I got to the rolling average of the last 4 races, like this:
df.groupby('HorseId')['FGrating'].apply(lambda x: x.shift().expanding(min_periods=no_starts).mean())
But I am stuck with the rest. Can you help me?
CodePudding user response:
Maybe try to use rolling and shift :
df['Rolling_Average'] = df.groupby('HorseID')['FGrating']/
.shift(1).transform(lambda x: x.rolling(4, 1).mean())
rolling creates a window, which could be use to calculate moving average, and shift shifts the row (so it wouldn't take actual row)
HorseID FGrating Rolling_Average
0 25620 112 NaN
1 25620 108 112.000
2 25620 111 110.000
3 25620 98 110.333
4 25620 103 107.250
5 25620 96 105.000
CodePudding user response:
Use rolling_mean
:
df['Rolling_Average'] = df.groupby('HorseId')['FGrating'].rolling(4, min_periods=1) \
.mean().shift(fill_value=0).droplevel(0)
print(df)
# Output:
HorseId FGrating Rolling average Rolling_Average
0 25620 112 0 NaN
1 25620 108 112 (112/1) 112.000000
2 25620 111 110 (112 108)/2 110.000000
3 25620 98 110.33 (112 108 111)/3 110.333333
4 25620 103 107.25 (112 108 111 98)/4 107.250000
5 25620 96 105 (108 111 98 103)/4 105.000000