I want to be able to feed a list as parameters to generate different aggregate functions in pandas
. To make this more concrete, let's say I have this as data:
import numpy as np
import pandas as pd
np.random.seed(0)
df_data = pd.DataFrame({
'group': np.repeat(['x', 'y'], 10),
'val': np.random.randint(0, 10, 20)
})
So the first few rows of the data looks like this:
group | val |
---|---|
x | 5 |
x | 0 |
x | 3 |
I have a list of per-group percentiles that I want to compute.
percentile_list = [10, 90]
And I tried to use dictionary comprehension with pd.NamedAgg
that calls a lambda function to do this.
df_agg = df_data.groupby('group').agg(
**{f'p{y}_by_dict': pd.NamedAgg('val', lambda x: np.quantile(x, y / 100)) for y in percentile_list},
)
But it doesn't work. Here I calculate both by hand and by dictionary comprehension.
df_agg = df_data.groupby('group').agg(
p10_by_hand=pd.NamedAgg('val', lambda x: np.quantile(x, 0.1)),
p90_by_hand=pd.NamedAgg('val', lambda x: np.quantile(x, 0.9)),
**{f'p{y}_by_dict': pd.NamedAgg('val', lambda x: np.quantile(x, y / 100)) for y in percentile_list},
)
The result looks like this. The manually specified aggregations work but the dictionary comprehension ones have the same values across different aggregations. I guess they just took the last lambda function in the generated dictionary.
p10_by_hand | p90_by_hand | p10_by_dict | p90_by_dict | |
---|---|---|---|---|
x | 1.8 | 7.2 | 7.2 | 7.2 |
y | 1.0 | 8.0 | 8.0 | 8.0 |
How do I fix this? I don't have to use dictionary comprehension, as long as each aggregation can be specified programmatically.
CodePudding user response:
In [23]: def agg_gen(y):
...: return lambda x: np.quantile(x, y / 100)
...:
In [24]: df_data.groupby('group').agg(
...: **{f'p{y}_by_dict': pd.NamedAgg('val', agg_gen(y)) for y in percentile_list},
...: )
Out[24]:
p10_by_dict p90_by_dict
group
x 1.8 7.2
y 1.0 8.0
the reason your initial assign fails is because of this - What do lambda function closures capture?