Suppose I have a dataset consisting of monthly, quarterly and annual average occurrences of an event:
multi_index = pd.MultiIndex.from_tuples([("2022-01-01", "2022-12-31"),
("2022-01-01", "2022-03-30"),
("2022-03-01", "2022-03-30"),
("2022-04-01", "2022-04-30")])
multi_index.names = ['period_begin', 'period_end']
df = pd.DataFrame(np.random.randint(10, size=4), index=multi_index)
df
0
period_begin period_end
2022-01-01 2022-12-31 4
2022-03-30 3
2022-03-01 2022-03-30 5
2022-04-01 2022-04-30 8
I want to calculate the monthly averages as a (simple) sum of these overlapping data. For instance, the mean in March 2022 should be equal to the sum of the observations March-2022, Q1-2022 and Y-2022. For April 2022, it's the sum of April-2022 and Y-2022 (Q2-2022 does not show up and has no observation). In the end, what I would like to have is:
month_begin Monthly_Avg
2022-01-01 7
2022-02-01 7
2022-03-01 12
2022-04-01 15
...
2022-12-01 4
I tried pd.Grouper()
but it didn't work. Does anybody have an idea? I would be grateful!
CodePudding user response:
Use date_range
in list comprehension for months values, create DataFrame and aggregate sum
:
L = [(x, v) for (s, e), v in df[0].items() for x in pd.`(s, e, freq='MS')]
df = (pd.DataFrame(L, columns=['month_begin','Data'])
.groupby('month_begin', as_index=False)['Data']
.sum())
print (df)
month_begin Data
0 2022-01-01 7
1 2022-02-01 7
2 2022-03-01 12
3 2022-04-01 12
4 2022-05-01 4
5 2022-06-01 4
6 2022-07-01 4
7 2022-08-01 4
8 2022-09-01 4
9 2022-10-01 4
10 2022-11-01 4
11 2022-12-01 4