I have a df
like this:
Foo | Bar |
---|---|
A | 54 |
A | 95 |
B | 49 |
A | 25 |
B | 25 |
B | 32 |
B | 55 |
A | 98 |
A | 83 |
B | 90 |
I am computing some statistics using [aggfunc
][1]:
temp = pd.DataFrame.from_dict({'Foo': {0: 'A',
1: 'A',
2: 'B',
3: 'A',
4: 'B',
5: 'B',
6: 'B',
7: 'A',
8: 'A',
9: 'B'},
'Bar': {0: 54, 1: 95, 2: 49, 3: 25, 4: 25, 5: 32, 6: 55, 7: 98, 8: 83, 9: 90}})
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std')
)
Which results (correctly) in:
Foo | max_val | min_val | std_val |
---|---|---|---|
A | 98 | 25 | 31.0403 |
B | 90 | 25 | 25.37124 |
However, I want to add another column which is proportion of total, so for example:
Foo | max_val | min_val | std_val | pct_total |
---|---|---|---|---|
A | 98 | 25 | 31.0403 | 0.59 |
B | 90 | 25 | 25.37124 | 0.41 |
How can I accomplish this using aggfunc
?
[1]: https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html
CodePudding user response:
You could try as follows:
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std'),
pct_total = pd.NamedAgg(column='Bar', aggfunc=lambda x: x.sum()/temp.Bar.sum())
)
print(res_df)
max_val min_val std_val pct_total
Foo
A 98 25 31.040296 0.585809
B 90 25 25.371244 0.414191
Or use round(x.sum()/temp.Bar.sum(),2)
if you want them rounded: 0.59, 0.41
.