I'm trying to simply sum the values between each specific set of dates that I have defined. I'm running into trouble as this isn't calendar year data, rather fiscal year. Looking to sum the values between the fiscal quarters. Any suggestions would be greatly appreciated!
import pandas as pd
from datetime import date
#sample dataframe
begin_date = '2017-01-01'
values = [*range(2100)]
df = pd.DataFrame({'values': values,
'dates':pd.date_range(begin_date, periods=len(values))})
df
#defined dates
F2Q17 = date(2017, 4, 1)
F3Q17 = date(2017, 7, 1)
F4Q17 = date(2017, 9, 30)
F1Q18 = date(2017, 12, 30)
F2Q18 = date(2018, 3, 31)
F3Q18 = date(2018, 6, 30)
F4Q18 = date(2018, 9, 29)
F1Q19 = date(2018, 12, 29)
F2Q19 = date(2019, 3, 30)
F3Q19 = date(2019, 6, 29)
F4Q19 = date(2019, 9, 28)
F1Q20 = date(2019, 12, 28)
F2Q20 = date(2020, 3, 28)
F3Q20 = date(2020, 6, 27)
F4Q20 = date(2020, 9, 26)
F1Q21 = date(2020, 12, 26)
F2Q21 = date(2021, 3, 27)
F3Q21 = date(2021, 6, 26)
F4Q21 = date(2021, 9, 25)
F1Q22 = date(2021, 12, 25)
F2Q22 = date(2022, 3, 26)
F3Q22 = date(2022, 6, 25)
F4Q22 = date(2022, 9, 25)
CodePudding user response:
You can try:
group_dates = [date(2017, 4, 1),
date(2017, 7, 1),
date(2017, 9, 30),
date(2017, 12, 30),
date(2018, 3, 31),
date(2018, 6, 30),
date(2018, 9, 29),
date(2018, 12, 29),
date(2019, 3, 30),
date(2019, 6, 29),
date(2019, 9, 28),
date(2019, 12, 28),
date(2020, 3, 28),
date(2020, 6, 27),
date(2020, 9, 26),
date(2020, 12, 26),
date(2021, 3, 27),
date(2021, 6, 26),
date(2021, 9, 25),
date(2021, 12, 25),
date(2022, 3, 26),
date(2022, 6, 25),
date(2022, 9, 25)]
df_groups = pd.DataFrame({"dates":pd.to_datetime(group_dates), "group":pd.to_datetime(group_dates)})
(
pd.merge(df, df_groups, on="dates", how="left")
.assign(group=lambda x: x.group.bfill())
.groupby("group")
[["values"]].sum()
.reset_index()
)
Output:
group values
0 2017-04-01 4095
1 2017-07-01 12376
2 2017-09-30 20657
3 2017-12-30 28938
4 2018-03-31 37219
5 2018-06-30 45500
6 2018-09-29 53781
7 2018-12-29 62062
8 2019-03-30 70343
9 2019-06-29 78624
10 2019-09-28 86905
11 2019-12-28 95186
12 2020-03-28 103467
13 2020-06-27 111748
14 2020-09-26 120029
15 2020-12-26 128310
16 2021-03-27 136591
17 2021-06-26 144872
18 2021-09-25 153153
19 2021-12-25 161434
20 2022-03-26 169715
21 2022-06-25 177996
22 2022-09-25 188370
CodePudding user response:
Try:
# put the required dates inside a list:
dates = [
date(2017, 4, 1),
date(2017, 7, 1),
date(2017, 9, 30),
date(2017, 12, 30),
date(2018, 3, 31),
date(2018, 6, 30),
date(2018, 9, 29),
date(2018, 12, 29),
date(2019, 3, 30),
date(2019, 6, 29),
date(2019, 9, 28),
date(2019, 12, 28),
date(2020, 3, 28),
date(2020, 6, 27),
date(2020, 9, 26),
date(2020, 12, 26),
date(2021, 3, 27),
date(2021, 6, 26),
date(2021, 9, 25),
date(2021, 12, 25),
date(2022, 3, 26),
date(2022, 6, 25),
date(2022, 9, 25),
]
# create a series by which we group the dataframe:
m = df["dates"].isin(dates).cumsum()
# group the dataframe, print some info:
for _, g in df.groupby(m):
print(f'Group {g["dates"].min().date()} - {g["dates"].max().date()}')
# uncomment to print whole group:
# print(g)
Prints:
Group 2017-01-01 - 2017-03-31
Group 2017-04-01 - 2017-06-30
Group 2017-07-01 - 2017-09-29
Group 2017-09-30 - 2017-12-29
Group 2017-12-30 - 2018-03-30
Group 2018-03-31 - 2018-06-29
Group 2018-06-30 - 2018-09-28
Group 2018-09-29 - 2018-12-28
Group 2018-12-29 - 2019-03-29
Group 2019-03-30 - 2019-06-28
Group 2019-06-29 - 2019-09-27
Group 2019-09-28 - 2019-12-27
Group 2019-12-28 - 2020-03-27
Group 2020-03-28 - 2020-06-26
Group 2020-06-27 - 2020-09-25
Group 2020-09-26 - 2020-12-25
Group 2020-12-26 - 2021-03-26
Group 2021-03-27 - 2021-06-25
Group 2021-06-26 - 2021-09-24
Group 2021-09-25 - 2021-12-24
Group 2021-12-25 - 2022-03-25
Group 2022-03-26 - 2022-06-24
Group 2022-06-25 - 2022-09-24
Group 2022-09-25 - 2022-10-01