I have the following dataframe
date sales cat
29/4/2022 2 a
30/4/2022 5 a
30/4/2022 1 b
1/5/2022 1 a
1/5/2022 8 b
1/5/2022 4 c
1/6/2022 7 a
1/6/2022 9 b
1/6/2022 5 c
I want to calculate the daily mean and the monthly mean of sales for each cat
.
The end dataframe should look like this
date sales cat daily_avg monthly_avg
29/4/2022 2 a 2 2
30/4/2022 5 a 3.5 7
30/4/2022 1 b 1 1
1/5/2022 1 a 2.67 4
1/5/2022 8 b 4.5 4.5
1/5/2022 4 c 4 4
1/6/2022 7 a 0.45 5
1/6/2022 9 b 0.56 6
1/6/2022 5 c 0.29 4.5
How can I implement this?
CodePudding user response:
USE-
#Monthly Average
df['monthly_avg'] = df.groupby(pd.PeriodIndex(df['date'], freq="M"))['Value'].mean()
#Daily Average
df['daily_avg'] = df.groupby(pd.PeriodIndex(df['date'], freq="D"))['Value'].mean()
This is untested code,always share reproducible code for test datasets.
Ref link- Get monthly average in pandas
CodePudding user response:
You should use pandas expanding
windowing operations: https://pandas.pydata.org/docs/user_guide/window.html
Btw your monthly average is not right (for example April should be 7/2 for cat a and 1/1 for cat b).
Something like this would count daily average:
df.groupBy(['date', 'cat']).expanding().avg()
For monthly create column containing just month and year and do similar thing.
CodePudding user response:
import pandas as pd
import numpy as np
rng = pd.date_range('29/4/2022 ', periods=9, freq='D')
sales = [2, 5, 1, 1, 8, 4, 7, 9, 5]
cat = ["a", "a", "b", "a", "b", "c", "a", "b", "c"]
df = pd.DataFrame({'Date': rng, 'sales': sales, 'cat': cat})
df['day'] = df['Date'].dt.to_period('D').astype(str)
df['cat'].astype(str)
df['month'] = df['Date'].dt.to_period('M').astype(str)
df['cat'].astype(str)
pivot_D = df.pivot_table(index='day', aggfunc={'sales': np.average})
pivot_D = pd.DataFrame(pivot_D)
pivot_M = df.pivot_table(index='month', aggfunc={'sales': np.average})
pivot_M = pd.DataFrame(pivot_M)
pivot_D_df_v = pd.merge(df, pivot_D, on='day', how='left')
pivot_D_df_v = pd.DataFrame(pivot_D_df_v)
pivot_D_df_v = pd.merge(pivot_D_df_v, pivot_M, on='month', how='left')
pivot_D_df_v = pd.DataFrame(pivot_D_df_v)
pivot_D_df_v.columns = pivot_D_df_v.columns.str.replace('sales_y', 'daily_avg')
pivot_D_df_v.columns = pivot_D_df_v.columns.str.replace('sales_x', 'sale')
pivot_D_df_v.columns = pivot_D_df_v.columns.str.replace('sales', 'monthly_avg')
pivot_D_df_v.drop(['day', 'month'], axis=1, inplace=True)
print(pivot_D_df_v)
- Create an helper column for days and months.
- Pivot the created columns.
- Combine, merge with the main base.
- Renaming columns (automatically assigned in the process)
- Delete the helper columns.