Home > Blockchain >  How to calculate mean in group by another group?
How to calculate mean in group by another group?

Time:11-02

I have a dataframe:

    date                    id       type        revenue
0 2021-09-01                Zw        b1         20.045350
1 2021-09-01                Aw        c          8.990000
2 2021-09-01                Zc        c          14.990000
3 2021-09-01                ww        b          25.944510
4 2021-09-01                jw        c          3.881649
5 2021-09-01                pw        b          9.990000
6 2021-09-01                fg        c          2.990000
7 2021-09-01                kl        b          4.990000
8 2021-09-02                mm        b          7.990000

I want to calculate mean revenue for each type, but not in group of type but in group of date. So for example mean type "b1" must be not 20.045350 (since there is only one b1 type) but 20.045350/8 = 2.5 (since there are 8 2021-09-01 values in column date). So desired result must be:

    date                    type      revenue
0 2021-09-01                b1        2.5
0 2021-09-01                c         3.85
0 2021-09-01                b         5.11
0 2021-09-02                b         7.990000

How to do that? groupby("date", "type").mean() brings wrong results:

    date                    type      revenue
0 2021-09-01                b1        20.045
0 2021-09-01                c         7.71
0 2021-09-01                b         13.64
0 2021-09-02                b         7.990000

CodePudding user response:

df1 = df.groupby('date')['id'].count().reset_index().\
             rename({'id':'count'}, axis = 1).merge(df)

df2 = df1.assign(revenue = df1.revenue/df1['count']).groupby(['date','type']).\
            agg({'revenue':sum}).reset_index()

df2 
         date type   revenue
0  2021-09-01    b  5.115564
1  2021-09-01   b1  2.505669
2  2021-09-01    c  3.856456
3  2021-09-02    b  7.990000

A fancy way of doing the same will be:

df.groupby('date')['id'].count().reset_index().rename({'id':'count'}, axis = 1).merge(df).\
   pipe(lambda x: x.assign(revenue = x.revenue/x['count'])).groupby(['date','type']).\
   agg({'revenue':sum}).reset_index()

CodePudding user response:

Do a double groupby and divide them :

(df.groupby(['type', 'date'])
   .revenue
   .sum()
   .div(df.date.value_counts(), level='date')
)
type  date
b     2021-09-01    5.115564
      2021-09-02    7.990000
b1    2021-09-01    2.505669
c     2021-09-01    3.856456
dtype: float64

Explanation :

  • Get the counts for the dates:
counts = df.date.value_counts()
  • Get the sum for the revenue, based on type and date:
revenue_sum =  df.groupby(['type', 'date']).revenue.sum()

Divide revenue_sum by counts, using the date level:

revenue_sum.div(counts, level='date')

type  date
b     2021-09-01    5.115564
      2021-09-02    7.990000
b1    2021-09-01    2.505669
c     2021-09-01    3.856456
dtype: float64
  • Related