I have a dataframe:
date id type revenue
0 2021-09-01 Zw b1 20.045350
1 2021-09-01 Aw c 8.990000
2 2021-09-01 Zc c 14.990000
3 2021-09-01 ww b 25.944510
4 2021-09-01 jw c 3.881649
5 2021-09-01 pw b 9.990000
6 2021-09-01 fg c 2.990000
7 2021-09-01 kl b 4.990000
8 2021-09-02 mm b 7.990000
I want to calculate mean revenue for each type, but not in group of type but in group of date. So for example mean type "b1" must be not 20.045350 (since there is only one b1 type) but 20.045350/8 = 2.5 (since there are 8 2021-09-01 values in column date). So desired result must be:
date type revenue
0 2021-09-01 b1 2.5
0 2021-09-01 c 3.85
0 2021-09-01 b 5.11
0 2021-09-02 b 7.990000
How to do that? groupby("date", "type").mean() brings wrong results:
date type revenue
0 2021-09-01 b1 20.045
0 2021-09-01 c 7.71
0 2021-09-01 b 13.64
0 2021-09-02 b 7.990000
CodePudding user response:
df1 = df.groupby('date')['id'].count().reset_index().\
rename({'id':'count'}, axis = 1).merge(df)
df2 = df1.assign(revenue = df1.revenue/df1['count']).groupby(['date','type']).\
agg({'revenue':sum}).reset_index()
df2
date type revenue
0 2021-09-01 b 5.115564
1 2021-09-01 b1 2.505669
2 2021-09-01 c 3.856456
3 2021-09-02 b 7.990000
A fancy way of doing the same will be:
df.groupby('date')['id'].count().reset_index().rename({'id':'count'}, axis = 1).merge(df).\
pipe(lambda x: x.assign(revenue = x.revenue/x['count'])).groupby(['date','type']).\
agg({'revenue':sum}).reset_index()
CodePudding user response:
Do a double groupby and divide them :
(df.groupby(['type', 'date'])
.revenue
.sum()
.div(df.date.value_counts(), level='date')
)
type date
b 2021-09-01 5.115564
2021-09-02 7.990000
b1 2021-09-01 2.505669
c 2021-09-01 3.856456
dtype: float64
Explanation :
- Get the counts for the dates:
counts = df.date.value_counts()
- Get the sum for the revenue, based on
type
anddate
:
revenue_sum = df.groupby(['type', 'date']).revenue.sum()
Divide revenue_sum
by counts
, using the date
level:
revenue_sum.div(counts, level='date')
type date
b 2021-09-01 5.115564
2021-09-02 7.990000
b1 2021-09-01 2.505669
c 2021-09-01 3.856456
dtype: float64