Home > Enterprise >  How can I compute a rolling sum using groupby in pandas?
How can I compute a rolling sum using groupby in pandas?

Time:12-02

I'm working on a fun side project and would like to compute a moving sum for number of wins for NBA teams over 2 year periods. Consider the sample pandas dataframe below,

pd.DataFrame({'Team':['Hawks','Hawks','Hawks','Hawks','Hawks'], 'Season':[1970,1971,1972,1973,1974],'Wins':[40,34,30,46,42]})

I would ideally like to compute the sum of the number of wins between 1970 and 1971, 1971 and 1972, 1972 and 1973, etc. An inefficient way would be to use a loop, is there a way to do this using the .groupby function?

CodePudding user response:

This is a little bit of a hack, but you could group by df['Season'] // 2 * 2, which means dividing by two, taking a floor operation, then multiplying by two again. The effect is to round each year to a multiple of two.

df_sum = pd.DataFrame(df.groupby(['Team', df['Season'] // 2 * 2])['Wins'].sum()).reset_index()

Output:

    Team  Season  Wins
0  Hawks    1970    74
1  Hawks    1972    76
2  Hawks    1974    42

CodePudding user response:

If you have years ordered for each team you can just use rolling with groupby on command. For example:

import pandas as pd

df = pd.DataFrame({'Team':['Hawks','Hawks','Hawks','Hawks','Hawks'], 'Season':[1970,1971,1972,1973,1974],'Wins':[40,34,30,46,42]})


res = df.groupby('Team')['Wins'].rolling(2).sum()
print(res)

Out:
     Team    
Hawks  0     NaN
       1    74.0
       2    64.0
       3    76.0
       4    88.0
  • Related