I'm having difficulties applying customs functions to a groupby
operation in pandas. Let's suppose that I have the following DataFrame to work with:
import pandas as pd
df = pd.DataFrame(
{
"id": [1, 1, 2, 2],
"flag": ["A", "A", "B", "B"],
"value1": [520, 250, 180, 360],
"value2": [11, 5, 7, 2],
}
)
print(df)
id flag value1 value2
0 1 A 520 11
1 1 A 250 5
2 2 B 180 7
3 2 B 360 2
I need to apply 4 aggregate functions to the above DataFrame grouped by id
and flag
. Specifically, for each id
and flag
:
- Calculate the mean of
value1
; - Calculate the sum of
value2
; - Calculate the mean of
(value1 * value2) / 12
; - Calculate the sum of
(value1 / value2)
.
I don't have any issues with the first two. This is what I did to calculate them:
df.groupby(["id", "flag"]).agg({"value1": ["mean"], "value2": ["sum"]})
value1 value2
mean sum
id flag
1 A 385.0 16
2 B 270.0 9
My problems are related to the last two aggregates. I search here for similar problems and I think I need to create two custom functions and apply them to the groupby
object. Unfortunately, all my attempts failed and I wasn't able to work this out. Also, if possible, I want all results to be in a single DataFrame like below (hopefully, I've calculated the numbers correctly):
value1 value2
mean sum func1 func2
id flag
1 A 385.0 16 290.42 97.27
2 B 270.0 9 82.5 205.71
CodePudding user response:
groupby().agg.
only takes in values of one columns.
With custom functions involving several columns, I would do something like this:
groupby = df.groupby(['id','flag'])
out = pd.DataFrame({
'value1': groupby['value1'].mean(),
'value2': groupby['value2'].sum(),
'value3': groupby.apply(lambda x: (x['value1'] * x['value2']).mean()/12),
'value4': groupby.apply(lambda x: (x['value1']/x['value2']).sum())
})
Output:
value1 value2 value3 value4
id flag
1 A 385 16 290.416667 97.272727
2 B 270 9 82.500000 205.714286
CodePudding user response:
What I'd do in this case is store (value1 * value2) / 12
and (value1 / value2)
in temporary columns, and then aggregate:
agg_funcs = {
'value1': 'mean',
'value2': 'sum',
'func1': 'mean',
'func2': 'sum',
}
agg = (
df.assign(
func1=lambda x: (x['value1'] * x['value2']) / 12,
func2=lambda x: (x['value1'] / x['value2'])
)
.groupby(["id", "flag"])
.agg(agg_funcs)
)
Output:
>>> agg
value1 value2 func1 func2
id flag
1 A 385.0 16 290.416667 97.272727
2 B 270.0 9 82.500000 205.714286
Obligatory one-liner:
df.assign(func1=lambda x: (x.value1 * x.value2) / 12, func2=lambda x: (x.value1 / x.value2)).groupby(["id","flag"]).agg({'value1': 'mean', 'value2': 'sum', 'func1': 'mean', 'func2': 'sum'})