Home > other >  Grouping a dataframe by defined dates
Grouping a dataframe by defined dates

Time:10-15

I'm trying to simply sum the values between each specific set of dates that I have defined. I'm running into trouble as this isn't calendar year data, rather fiscal year. Looking to sum the values between the fiscal quarters. Any suggestions would be greatly appreciated!

import pandas as pd
from datetime import date

#sample dataframe
begin_date = '2017-01-01'
values = [*range(2100)]
df = pd.DataFrame({'values': values,
                  'dates':pd.date_range(begin_date, periods=len(values))})

df


#defined dates
F2Q17 =  date(2017, 4, 1)
F3Q17 =  date(2017, 7, 1)
F4Q17 =  date(2017, 9, 30)
F1Q18 =  date(2017, 12, 30)
F2Q18 =  date(2018, 3, 31)
F3Q18 =  date(2018, 6, 30)
F4Q18 =  date(2018, 9, 29)
F1Q19 =  date(2018, 12, 29)
F2Q19 =  date(2019, 3, 30)
F3Q19 =  date(2019, 6, 29)
F4Q19 =  date(2019, 9, 28)
F1Q20 =  date(2019, 12, 28)
F2Q20 =  date(2020, 3, 28)
F3Q20 =  date(2020, 6, 27)
F4Q20 =  date(2020, 9, 26)
F1Q21 =  date(2020, 12, 26)
F2Q21 =  date(2021, 3, 27)
F3Q21 =  date(2021, 6, 26)
F4Q21 =  date(2021, 9, 25)
F1Q22 =  date(2021, 12, 25)
F2Q22 =  date(2022, 3, 26)
F3Q22 =  date(2022, 6, 25)
F4Q22 = date(2022, 9, 25)

CodePudding user response:

You can try:

group_dates = [date(2017, 4, 1),
date(2017, 7, 1),
date(2017, 9, 30),
date(2017, 12, 30),
date(2018, 3, 31),
date(2018, 6, 30),
date(2018, 9, 29),
date(2018, 12, 29),
date(2019, 3, 30),
date(2019, 6, 29),
date(2019, 9, 28),
date(2019, 12, 28),
date(2020, 3, 28),
date(2020, 6, 27),
date(2020, 9, 26),
date(2020, 12, 26),
date(2021, 3, 27),
date(2021, 6, 26),
date(2021, 9, 25),
date(2021, 12, 25),
date(2022, 3, 26),
date(2022, 6, 25),
date(2022, 9, 25)]

df_groups = pd.DataFrame({"dates":pd.to_datetime(group_dates), "group":pd.to_datetime(group_dates)})

(
    pd.merge(df, df_groups, on="dates", how="left")
    .assign(group=lambda x: x.group.bfill())
    .groupby("group")
    [["values"]].sum()
    .reset_index()
)

Output:

    group       values
0   2017-04-01  4095
1   2017-07-01  12376
2   2017-09-30  20657
3   2017-12-30  28938
4   2018-03-31  37219
5   2018-06-30  45500
6   2018-09-29  53781
7   2018-12-29  62062
8   2019-03-30  70343
9   2019-06-29  78624
10  2019-09-28  86905
11  2019-12-28  95186
12  2020-03-28  103467
13  2020-06-27  111748
14  2020-09-26  120029
15  2020-12-26  128310
16  2021-03-27  136591
17  2021-06-26  144872
18  2021-09-25  153153
19  2021-12-25  161434
20  2022-03-26  169715
21  2022-06-25  177996
22  2022-09-25  188370

CodePudding user response:

Try:

# put the required dates inside a list:

dates = [
    date(2017, 4, 1),
    date(2017, 7, 1),
    date(2017, 9, 30),
    date(2017, 12, 30),
    date(2018, 3, 31),
    date(2018, 6, 30),
    date(2018, 9, 29),
    date(2018, 12, 29),
    date(2019, 3, 30),
    date(2019, 6, 29),
    date(2019, 9, 28),
    date(2019, 12, 28),
    date(2020, 3, 28),
    date(2020, 6, 27),
    date(2020, 9, 26),
    date(2020, 12, 26),
    date(2021, 3, 27),
    date(2021, 6, 26),
    date(2021, 9, 25),
    date(2021, 12, 25),
    date(2022, 3, 26),
    date(2022, 6, 25),
    date(2022, 9, 25),
]

# create a series by which we group the dataframe:
m = df["dates"].isin(dates).cumsum()

# group the dataframe, print some info:
for _, g in df.groupby(m):
    print(f'Group {g["dates"].min().date()} - {g["dates"].max().date()}')

    # uncomment to print whole group:
    # print(g)

Prints:

Group 2017-01-01 - 2017-03-31
Group 2017-04-01 - 2017-06-30
Group 2017-07-01 - 2017-09-29
Group 2017-09-30 - 2017-12-29
Group 2017-12-30 - 2018-03-30
Group 2018-03-31 - 2018-06-29
Group 2018-06-30 - 2018-09-28
Group 2018-09-29 - 2018-12-28
Group 2018-12-29 - 2019-03-29
Group 2019-03-30 - 2019-06-28
Group 2019-06-29 - 2019-09-27
Group 2019-09-28 - 2019-12-27
Group 2019-12-28 - 2020-03-27
Group 2020-03-28 - 2020-06-26
Group 2020-06-27 - 2020-09-25
Group 2020-09-26 - 2020-12-25
Group 2020-12-26 - 2021-03-26
Group 2021-03-27 - 2021-06-25
Group 2021-06-26 - 2021-09-24
Group 2021-09-25 - 2021-12-24
Group 2021-12-25 - 2022-03-25
Group 2022-03-26 - 2022-06-24
Group 2022-06-25 - 2022-09-24
Group 2022-09-25 - 2022-10-01
  • Related