Home > front end >  How can I compute the rolling average of a column up to a certain number of rows?
How can I compute the rolling average of a column up to a certain number of rows?

Time:01-05

I have the following dataframe:

HorseId FGrating
25620 112
25620 108
25620 111
25620 98
25620 103
25620 96

I want to compute the rolling average of the FGrating column according to these rules:

  1. If I have 4 or more races of a certain HorseId, then I compute the rolling average for the last 4 races;
  2. If I have less than 4 races, then I compute the rolling average for the races that I have so far.

The average should be displayed on the following row of the rows used to compute the aforementioned metric. As an example: the rolling average for the first 4 races should be displayed on the 5th row. So, on the first row the correct value is 0

What I am looking for is a table like this:

HorseId FGrating Rolling average
25620 112 0
25620 108 112 (112/1)
25620 111 110 (112 108)/2
25620 98 110.33 (112 108 111)/3
25620 103 107.25 (112 108 111 98)/4
25620 96 105 (108 111 98 103)/4

So far, I got to the rolling average of the last 4 races, like this:

df.groupby('HorseId')['FGrating'].apply(lambda x: x.shift().expanding(min_periods=no_starts).mean())

But I am stuck with the rest. Can you help me?

CodePudding user response:

Maybe try to use rolling and shift :

df['Rolling_Average'] = df.groupby('HorseID')['FGrating']/
    .shift(1).transform(lambda x: x.rolling(4, 1).mean())

rolling creates a window, which could be use to calculate moving average, and shift shifts the row (so it wouldn't take actual row)

    HorseID FGrating    Rolling_Average
0   25620   112         NaN
1   25620   108         112.000
2   25620   111         110.000
3   25620   98          110.333
4   25620   103         107.250
5   25620   96          105.000

CodePudding user response:

Use rolling_mean:

df['Rolling_Average'] = df.groupby('HorseId')['FGrating'].rolling(4, min_periods=1) \
                          .mean().shift(fill_value=0).droplevel(0)
print(df)

# Output:
   HorseId  FGrating            Rolling average  Rolling_Average
0    25620       112                          0              NaN
1    25620       108                112 (112/1)       112.000000
2    25620       111            110 (112 108)/2       110.000000
3    25620        98     110.33 (112 108 111)/3       110.333333
4    25620       103  107.25 (112 108 111 98)/4       107.250000
5    25620        96     105 (108 111 98 103)/4       105.000000
  •  Tags:  
  • Related