Home > Blockchain >  Pandas - calculate monthly average from data with mixed frequencies
Pandas - calculate monthly average from data with mixed frequencies

Time:06-27

Suppose I have a dataset consisting of monthly, quarterly and annual average occurrences of an event:

multi_index = pd.MultiIndex.from_tuples([("2022-01-01", "2022-12-31"), 
                                  ("2022-01-01", "2022-03-30"), 
                                  ("2022-03-01", "2022-03-30"),
                                  ("2022-04-01", "2022-04-30")])

multi_index.names = ['period_begin', 'period_end']

df = pd.DataFrame(np.random.randint(10, size=4), index=multi_index)
df

                         0
period_begin period_end   
2022-01-01   2022-12-31  4
             2022-03-30  3
2022-03-01   2022-03-30  5
2022-04-01   2022-04-30  8

I want to calculate the monthly averages as a (simple) sum of these overlapping data. For instance, the mean in March 2022 should be equal to the sum of the observations March-2022, Q1-2022 and Y-2022. For April 2022, it's the sum of April-2022 and Y-2022 (Q2-2022 does not show up and has no observation). In the end, what I would like to have is:

month_begin  Monthly_Avg                    
2022-01-01   7
2022-02-01   7
2022-03-01   12
2022-04-01   15
...
2022-12-01   4

I tried pd.Grouper() but it didn't work. Does anybody have an idea? I would be grateful!

CodePudding user response:

Use date_range in list comprehension for months values, create DataFrame and aggregate sum:

L = [(x, v) for (s, e), v in df[0].items() for x in pd.`(s, e, freq='MS')]

df = (pd.DataFrame(L, columns=['month_begin','Data'])
        .groupby('month_begin', as_index=False)['Data']
        .sum())
print (df)
   month_begin  Data
0   2022-01-01     7
1   2022-02-01     7
2   2022-03-01    12
3   2022-04-01    12
4   2022-05-01     4
5   2022-06-01     4
6   2022-07-01     4
7   2022-08-01     4
8   2022-09-01     4
9   2022-10-01     4
10  2022-11-01     4
11  2022-12-01     4
  • Related