I have a Dataframe that looks like this
WEEK | DELIVERY_BOY_ID | TOTAL_GMV |
---|---|---|
2022-04-04 | 999999999.0 | 470510.11 |
2022-04-11 | 999999999.0 | 557351.02 |
2022-04-18 | 999999999.0 | 454225.78 |
2022-04-25 | 999999999.0 | 527932.46 |
2022-05-02 | 999999999.0 | 556741.18 |
2022-05-09 | 999999999.0 | 524571.93 |
2022-05-16 | 999999999.0 | 547195.66 |
2022-05-23 | 999999999.0 | 112423.49 |
What I'm trying to do is to sum the TOTAL_GMV every 4 weeks (sum from week 2022-05-02 to 2022-05-23, sum from 2022-04-25 to 2022-05-16 and so on for every week), and to show me the result with the date of the last week that was summed
So, the final result should look something like this:
WEEK | DELIVERY_BOY_ID | TOTAL_GMV | EXPLANATION |
---|---|---|---|
2022-04-04 | 999999999.0 | ********* | Sum from 2022-03-14 to 2022-04-04 |
2022-04-11 | 999999999.0 | ********* | Sum from 2022-03-21 to 2022-04-11 |
2022-04-18 | 999999999.0 | ********* | Sum from 2022-03-28 to 2022-04-18 |
2022-04-25 | 999999999.0 | 2.010.018,91 | Sum from 2022-04-04 to 2022-04-25 |
2022-05-02 | 999999999.0 | 2.096.250,44 | Sum from 2022-04-11 to 2022-05-02 |
2022-05-09 | 999999999.0 | 2.063.469,15 | Sum from 2022-04-18 to 2022-05-09 |
2022-05-16 | 999999999.0 | 2.156.441,23 | Sum from 2022-04-25 to 2022-05-16 |
2022-05-23 | 999999999.0 | 1.639.932,26 | Sum from 2022-05-02 to 2022-05-23 |
Any idea how to do it?
Thanks!!
CodePudding user response:
Assuming that WEEK
is the index, you can do:
>>> df.TOTAL_GMV.rolling(4).sum()
WEEK
2022-04-04 NaN
2022-04-11 NaN
2022-04-18 NaN
2022-04-25 2010019.37
2022-05-02 2096250.44
2022-05-09 2063471.35
2022-05-16 2156441.23
2022-05-23 1740932.26
Name: TOTAL_GMV, dtype: float64
To add it to the df,
df['TOTAL_GMV'] = df.TOTAL_GMV.rolling(4).sum()
(If it's not the index, change that to df.set_index('WEEK').TOTAL_GMV.rolling(4).sum()
.)