I'm working on a fun side project and would like to compute a moving sum for number of wins for NBA teams over 2 year periods. Consider the sample pandas dataframe below,
pd.DataFrame({'Team':['Hawks','Hawks','Hawks','Hawks','Hawks'], 'Season':[1970,1971,1972,1973,1974],'Wins':[40,34,30,46,42]})
I would ideally like to compute the sum of the number of wins between 1970 and 1971, 1971 and 1972, 1972 and 1973, etc. An inefficient way would be to use a loop, is there a way to do this using the .groupby function?
CodePudding user response:
This is a little bit of a hack, but you could group by df['Season'] // 2 * 2
, which means dividing by two, taking a floor operation, then multiplying by two again. The effect is to round each year to a multiple of two.
df_sum = pd.DataFrame(df.groupby(['Team', df['Season'] // 2 * 2])['Wins'].sum()).reset_index()
Output:
Team Season Wins
0 Hawks 1970 74
1 Hawks 1972 76
2 Hawks 1974 42
CodePudding user response:
If you have years ordered for each team you can just use rolling
with groupby
on command. For example:
import pandas as pd
df = pd.DataFrame({'Team':['Hawks','Hawks','Hawks','Hawks','Hawks'], 'Season':[1970,1971,1972,1973,1974],'Wins':[40,34,30,46,42]})
res = df.groupby('Team')['Wins'].rolling(2).sum()
print(res)
Out:
Team
Hawks 0 NaN
1 74.0
2 64.0
3 76.0
4 88.0