Calculate daily and monthly averages for categories in dataframe-CodePudding

I have the following dataframe

date        sales      cat
29/4/2022   2          a
30/4/2022   5          a
30/4/2022   1          b
1/5/2022    1          a
1/5/2022    8          b 
1/5/2022    4          c
1/6/2022    7          a
1/6/2022    9          b
1/6/2022    5          c

I want to calculate the daily mean and the monthly mean of sales for each cat. The end dataframe should look like this

    date        sales      cat    daily_avg    monthly_avg
    29/4/2022   2          a     2            2
    30/4/2022   5          a     3.5          7
    30/4/2022   1          b     1            1
    1/5/2022    1          a     2.67         4
    1/5/2022    8          b     4.5          4.5
    1/5/2022    4          c     4            4
    1/6/2022    7          a     0.45         5
    1/6/2022    9          b     0.56         6
    1/6/2022    5          c     0.29         4.5

How can I implement this?

CodePudding user response：

USE-

#Monthly Average
df['monthly_avg'] = df.groupby(pd.PeriodIndex(df['date'], freq="M"))['Value'].mean()

#Daily Average
df['daily_avg'] = df.groupby(pd.PeriodIndex(df['date'], freq="D"))['Value'].mean()

This is untested code,always share reproducible code for test datasets.

Ref link- Get monthly average in pandas

CodePudding user response：

You should use pandas expanding windowing operations: https://pandas.pydata.org/docs/user_guide/window.html

Btw your monthly average is not right (for example April should be 7/2 for cat a and 1/1 for cat b).

Something like this would count daily average:

df.groupBy(['date', 'cat']).expanding().avg()

For monthly create column containing just month and year and do similar thing.

CodePudding user response：

import pandas as pd
import numpy as np

rng = pd.date_range('29/4/2022 ', periods=9, freq='D')
sales = [2, 5, 1, 1, 8, 4, 7, 9, 5]
cat = ["a", "a", "b", "a", "b", "c", "a", "b", "c"]

df = pd.DataFrame({'Date': rng, 'sales': sales, 'cat': cat})
df['day'] = df['Date'].dt.to_period('D').astype(str)   
df['cat'].astype(str)
df['month'] = df['Date'].dt.to_period('M').astype(str)   
df['cat'].astype(str)

pivot_D = df.pivot_table(index='day', aggfunc={'sales': np.average})
pivot_D = pd.DataFrame(pivot_D)
pivot_M = df.pivot_table(index='month', aggfunc={'sales': np.average})
pivot_M = pd.DataFrame(pivot_M)

pivot_D_df_v = pd.merge(df, pivot_D, on='day', how='left')
pivot_D_df_v = pd.DataFrame(pivot_D_df_v)
pivot_D_df_v = pd.merge(pivot_D_df_v, pivot_M, on='month', how='left')
pivot_D_df_v = pd.DataFrame(pivot_D_df_v)

pivot_D_df_v.columns = pivot_D_df_v.columns.str.replace('sales_y',     'daily_avg')
pivot_D_df_v.columns = pivot_D_df_v.columns.str.replace('sales_x', 'sale')
pivot_D_df_v.columns = pivot_D_df_v.columns.str.replace('sales', 'monthly_avg')

pivot_D_df_v.drop(['day', 'month'], axis=1, inplace=True)

print(pivot_D_df_v)

Create an helper column for days and months.
Pivot the created columns.
Combine, merge with the main base.
Renaming columns (automatically assigned in the process)
Delete the helper columns.