Home > Blockchain >  Getting Rolling Sum per Group
Getting Rolling Sum per Group

Time:05-25

I have a dataframe like this:

Product_ID    Quantity    Year    Quarter   
  1             100       2021      1          
  1             100       2021      2         
  1              50       2021      3          
  1             100       2021      4          
  1             100       2022      1         
  2             100       2021      1          
  2             100       2021      2          
  3             100       2021      1          
  3             100       2021      2         

I would like to get the Sum of the last three months (excluding the current month), per Product_ID.

Therefore I tried this:

df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
                         .rolling(3).sum().reset_index(0,drop=True)
                       )

# Shifting 1, because I want to exclude the current row. 
# Rolling 3, because I want to have the 3 'rows' before 
# Grouping by, because I want to have the calculation PER product 

My code is failing, because it does not only calculate it per product, but it will give me also numbers for other products (let's say Product 2, quarter 1: gives me the 3 rows from product 1).

My proposed outcome:

Product_ID    Quantity    Year    Quarter   Qty_Sum_3qrts
  1             100       2021      1          0 # because we dont historical data for this id
  1             100       2021      2          100 # sum of last month of this product 
  1              50       2021      3          200 # sum of last 2 months of this product
  1             100       2021      4          250 # sum of last 3 months of this product
  1             100       2022      1          250 # sum of last 3 months of this product
  2             100       2021      1          0  # because we dont have hist data for this id
  2             100       2021      2          100 # sum of last month of this product
  3             100       2021      1          0   # etc
  3             100       2021      2          100  # etc 

CodePudding user response:

You need to apply the rolling sum per group, you can use apply for this:

df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
                         .apply(lambda s: s.shift(1,fill_value=0)
                                           .rolling(3, min_periods=1).sum())
                       )

output:

   Product_ID  Quantity  Year  Quarter  Qty_Sum_3qrts
0           1       100  2021        1            0.0
1           1       100  2021        2          100.0
2           1        50  2021        3          200.0
3           1       100  2021        4          250.0
4           1       100  2022        1          250.0
5           2       100  2021        1            0.0
6           2       100  2021        2          100.0
7           3       100  2021        1            0.0
8           3       100  2021        2          100.0
  • Related