Home > Enterprise >  How can I determine the average of a column up to this point, using groupby?
How can I determine the average of a column up to this point, using groupby?

Time:11-14

I have the following data frame:

HorseId FGrating Average FGrating
1736 110 -1
1736 124 -1
1736 118 -1
13973 144 -1
13973 137 -1

I want to fill the Average FGrating column with the average FGrating of every horse up to the point it was computed, grouped by HorseId. The result that I am looking for is this:

HorseId FGrating Average FGrating
1736 110 110
1736 124 117 (110 124)/2
1736 118 117.3 (110 124 118)/3
13973 144 144
13973 137 140.5 (144 137)/2

The code I used to solve this problem is:

featured_data['Average FGrating'] = featured_data[['HorseId', 'FGrating']].groupby('HorseId')[
    'FGrating'].mean()

However, it computes the desired average in only a part of the data frame not in all of it.

What am I doing wrong? How can I solve this problem?

CodePudding user response:

We can start by grouping by HorseId and then get the cumsum of FGrating. To get the average, we just need to divide the cumsum with a cumcount like so :

>>> df_grouped =  df.groupby('HorseId')['FGrating']
>>> df['cum_sum'] = df_grouped.apply(lambda p: p.shift(fill_value=0).cumsum())
>>> df['cum_mean'] = df['cum_sum'] / df_grouped.cumcount()
>>> df['cum_mean'].fillna(df['FGrating'], inplace=True)
>>> df
    HorseId     FGrating    Average FGrating    cum_sum     cum_mean
0   1736        110         -1                  0           110.0
1   1736        124         -1                  110         110.0
2   1736        118         -1                  234         117.0
3   13973       144         -1                  0           144.0
4   13973       137         -1                  144         144.0

Or we can also do it this way (shorter) :

df['cum_mean'] = (
    df.groupby('HorseId')['FGrating'].apply(lambda x: x.shift().expanding().mean()))
  • Related