How can I determine the average of a column up to this point, using groupby?-CodePudding

I have the following data frame:

HorseId	FGrating	Average FGrating
1736	110	-1
1736	124	-1
1736	118	-1
13973	144	-1
13973	137	-1

I want to fill the Average FGrating column with the average FGrating of every horse up to the point it was computed, grouped by HorseId. The result that I am looking for is this:

HorseId	FGrating	Average FGrating
1736	110	110
1736	124	117 (110 124)/2
1736	118	117.3 (110 124 118)/3
13973	144	144
13973	137	140.5 (144 137)/2

The code I used to solve this problem is:

featured_data['Average FGrating'] = featured_data[['HorseId', 'FGrating']].groupby('HorseId')[
    'FGrating'].mean()

However, it computes the desired average in only a part of the data frame not in all of it.

What am I doing wrong? How can I solve this problem?

CodePudding user response：

We can start by grouping by HorseId and then get the cumsum of FGrating. To get the average, we just need to divide the cumsum with a cumcount like so :

>>> df_grouped =  df.groupby('HorseId')['FGrating']
>>> df['cum_sum'] = df_grouped.apply(lambda p: p.shift(fill_value=0).cumsum())
>>> df['cum_mean'] = df['cum_sum'] / df_grouped.cumcount()
>>> df['cum_mean'].fillna(df['FGrating'], inplace=True)
>>> df
    HorseId     FGrating    Average FGrating    cum_sum     cum_mean
0   1736        110         -1                  0           110.0
1   1736        124         -1                  110         110.0
2   1736        118         -1                  234         117.0
3   13973       144         -1                  0           144.0
4   13973       137         -1                  144         144.0

Or we can also do it this way (shorter) :

df['cum_mean'] = (
    df.groupby('HorseId')['FGrating'].apply(lambda x: x.shift().expanding().mean()))