First off I would like to say I checked out [this answer][1] and I couldn't progress with the information there.
So I have a dataset as such
df = pd.DataFrame({'ID': [10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013],
'Type': ['Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue'],
'Year': [2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021,2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021],
'Score': [0,0,0,0,0,0,0,0,0,0,14,24,16,5,87,33,0,0,0,0,0,0,0,0,0,0,11,13,3,16,37,49]})
I don't know how to type in NaN so I make it like this:
df.replace(0, np.nan, inplace=True)
df = df.dropna(axis=0, subset=['Score'])
I want to take the rolling N (In this case 3) Year average of score by ID and Subject.
I can get the 3 year rolling average score by ID:
df['average_past_3_years'] = df.groupby(['ID'], as_index = False).rolling(3).agg( {'Score':'mean', 'Year': 'max'}).reset_index(level=0).groupby( 'Year').transform('shift')['Score'] df = df.sort_values(['ID', 'Year'])
That gives me the rolling average but only by year on ID, not subject as well:
ID Type Year Score average_past_3_years
10010 Red 2018 14.0 NaN
10010 Blue 2018 24.0 NaN
10010 Red 2019 16.0 NaN
10010 Blue 2019 5.0 18.000000
10010 Red 2021 87.0 NaN
10010 Blue 2021 33.0 36.000000
10013 Red 2018 11.0 NaN
10013 Blue 2018 13.0 NaN
10013 Red 2019 3.0 15.000000
10013 Blue 2019 16.0 9.000000
10013 Red 2021 37.0 41.666667
10013 Blue 2021 49.0 18.666667
I am trying to out put say 10013 for red the 2021 would be 17, not 41 as it would only be calculating the red scores.
I tried:
df['average_past_3_years'] = df.groupby([['ID', 'Type']], as_index = False).rolling(3).agg(
{'Score':'mean', 'Year': 'max'}).reset_index(level=0).groupby(
'Year').transform('shift')['Score']
but got this error:
Grouper and axis must be same length
And kinda got stuck there
I'm also unsure if I need to sort before hand
[1]: https://stackoverflow.com/questions/61837481/pandas-how-to-calculate-mean-values-of-the-past-n-years-for-every-month
CodePudding user response:
IIUC use:
df1 = df.groupby(['ID','Type']).rolling(3).agg( {'Score':'mean', 'Year': 'max'}).droplevel([0,1])
df = df.join(df1.add_suffix('_r'))
df['Score_r'] = df.groupby('Year')['Score_r'].shift()