I have a dataframe like as shown below
score,region,supplier
27,AP,ABC
35,AP,ABC
29,ANZ,DEF
35,ANZ,DEF
30,KOREA,GHI
34,KOREA,GHI
I was working on feature engineering activities and found out that we have a quarterly score generated for each supplier based on their performance and this score ranges from 0 to 35.
Meaning, a supplier gets this score, 4 times every year (Q1,Q2,Q3 and Q4) based on their performance. 0 indicates poor performance and 35 indicates best performance.
What I did was I computed the moving average of these scores for a supplier based on their history (past data) till current time point.
However, when I do that I realize that the mov_avg_score
goes beyond the range of 35.
Meaning, the mov_avg_score column contains values like 37.23, 52 etc?
Is this even possible? How can a derived score (computed using moving average(mean)) exceed its range of original score? shouldn't the moving average score be also on the same range?
In case, if you are interested to look at my code, please find below
df['mov_avg_score'] = df.groupby(['supplier','region'])['score'].expanding().mean().shift().values
CodePudding user response:
df['rolling_mean']=df.groupby(['supplier','region'],as_index=False,)['score'].rolling(2, min_periods=1).mean()['score']
score region supplier rolling_mean
0 27 AP ABC 27.0
1 35 AP ABC 31.0
2 29 ANZ DEF 29.0
3 35 ANZ DEF 32.0
4 30 KOREA GHI 30.0
5 34 KOREA GHI 32.0
Following your comment, use size of largest group as window
s=df.groupby(['supplier','region'])['score'].size().max()#compute window size
#rolling mean
df['rolling_mean']=df.groupby(['supplier','region'],as_index=False,)['score'].rolling(s,min_periods=1).mean()['score']