Home > database >  Pandas : Getting a cumulative sum for each month on the last friday
Pandas : Getting a cumulative sum for each month on the last friday

Time:05-23

I got a dataframe which look like this :

date_order               date_despatch          date_validation        qty_ordered
2019-01-01 00:00:00     2019-11-01 00:00:00     2019-13-01 00:00:00    4.15
2019-01-01 00:00:00     2019-12-01 00:00:00     2019-14-01 00:00:00    5.9
2019-02-01 00:00:00     2019-16-01 00:00:00     2019-19-01 00:00:00    7.8
2019-03-01 00:00:00     2019-18-01 00:00:00     2019-20-01 00:00:00    9.6
2019-04-01 00:00:00     2019-22-01 00:00:00     2019-24-01 00:00:00    1.3
...
2019-03-02 00:00:00     2019-22-02 00:00:00     2019-25-02 00:00:00    1.2

My goal is to get, for each month, a cumulative sum of the quantity ordered from the start of the month to the last friday of this same month (e.g : 2019-01-01 to 2019-25-01 for January 2019)

What would be expected :

date_order             cumulative_ordered
2019-01-01 00:00:00    10.05
2019-02-01 00:00:00    17.85
...                    ...
2019-24-01 00:00:00    150
2019-25-01 00:00:00    157

Can anyone help me on this?

CodePudding user response:

With an example df with qty_ordered always 1 (so we can easily keep track of the result):

import pandas as pd

df = pd.DataFrame({'date_order': pd.date_range('2019-01-01', '2019-03-01')})
df['qty_ordered'] = 1

print(df)
   date_order  qty_ordered
0  2019-01-01            1
1  2019-01-02            1
2  2019-01-03            1
3  2019-01-04            1
4  2019-01-05            1
5  2019-01-06            1
6  2019-01-07            1
7  2019-01-08            1
...
59 2019-03-01            1

Last Friday of Jan 2019 was the 2019-01-25, while in February it was 2019-02-22. We keep that in mind to verify the cumsums.

You could do:

# Make sure dates are sorted.
df = df.sort_values('date_order')

# Flag the Fridays.
df['n_friday'] = df['date_order'].dt.dayofweek.eq(4)

# Column to groupby.
df['year_month'] = df['date_order'].dt.to_period("M")

# Remove days past the last Friday in each year/month group.
mask = df.groupby('year_month')['n_friday'].transform(lambda s: s.cumsum().shift().fillna(0).lt(4))
res_df = df[mask].drop(columns=['n_friday'])

# Calculate cumsum for each month.
res_df['cumulative_ordered'] = res_df.groupby('year_month')['qty_ordered'].cumsum()

print(res_df.drop(columns=['year_month']))
   date_order  qty_ordered  ordered_cumusm
0  2019-01-01            1               1
1  2019-01-02            1               2
2  2019-01-03            1               3
3  2019-01-04            1               4
4  2019-01-05            1               5
5  2019-01-06            1               6
6  2019-01-07            1               7
7  2019-01-08            1               8
...
52 2019-02-22            1              22
59 2019-03-01            1               1

To check the cumsum and day selection worked:

print(res_df.groupby('year_month').last())
           date_order  qty_ordered  cumulative_ordered
year_month                                            
2019/01    2019-01-25            1                  25
2019/02    2019-02-22            1                  22
2019/03    2019-03-01            1                   1
  • Related