Home > Blockchain >  Proportion of total in pd.NamedAgg
Proportion of total in pd.NamedAgg

Time:08-18

I have a df like this:

Foo Bar
A 54
A 95
B 49
A 25
B 25
B 32
B 55
A 98
A 83
B 90

I am computing some statistics using [aggfunc][1]:

temp = pd.DataFrame.from_dict({'Foo': {0: 'A',
  1: 'A',
  2: 'B',
  3: 'A',
  4: 'B',
  5: 'B',
  6: 'B',
  7: 'A',
  8: 'A',
  9: 'B'},
 'Bar': {0: 54, 1: 95, 2: 49, 3: 25, 4: 25, 5: 32, 6: 55, 7: 98, 8: 83, 9: 90}})

res_df = temp.groupby('Foo').agg(
    max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
    min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
    std_val = pd.NamedAgg(column='Bar', aggfunc='std')
)

Which results (correctly) in:

Foo max_val min_val std_val
A 98 25 31.0403
B 90 25 25.37124

However, I want to add another column which is proportion of total, so for example:

Foo max_val min_val std_val pct_total
A 98 25 31.0403 0.59
B 90 25 25.37124 0.41

How can I accomplish this using aggfunc? [1]: https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html

CodePudding user response:

You could try as follows:

res_df = temp.groupby('Foo').agg(
    max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
    min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
    std_val = pd.NamedAgg(column='Bar', aggfunc='std'),
    pct_total = pd.NamedAgg(column='Bar', aggfunc=lambda x: x.sum()/temp.Bar.sum())
)

print(res_df)

     max_val  min_val    std_val  pct_total
Foo                                        
A         98       25  31.040296   0.585809
B         90       25  25.371244   0.414191

Or use round(x.sum()/temp.Bar.sum(),2) if you want them rounded: 0.59, 0.41.

  • Related