Home > Software design >  How do I use `pd.NamedAgg` with a lambda function inside a `pandas` aggregation?
How do I use `pd.NamedAgg` with a lambda function inside a `pandas` aggregation?

Time:12-21

I want to be able to feed a list as parameters to generate different aggregate functions in pandas. To make this more concrete, let's say I have this as data:

import numpy as np
import pandas as pd

np.random.seed(0)
df_data = pd.DataFrame({
    'group': np.repeat(['x', 'y'], 10),
    'val': np.random.randint(0, 10, 20)
})

So the first few rows of the data looks like this:

group val
x 5
x 0
x 3

I have a list of per-group percentiles that I want to compute.

percentile_list = [10, 90]

And I tried to use dictionary comprehension with pd.NamedAgg that calls a lambda function to do this.

df_agg = df_data.groupby('group').agg(
    **{f'p{y}_by_dict': pd.NamedAgg('val', lambda x: np.quantile(x, y / 100)) for y in percentile_list},
)

But it doesn't work. Here I calculate both by hand and by dictionary comprehension.

df_agg = df_data.groupby('group').agg(
    p10_by_hand=pd.NamedAgg('val', lambda x: np.quantile(x, 0.1)),
    p90_by_hand=pd.NamedAgg('val', lambda x: np.quantile(x, 0.9)),
    **{f'p{y}_by_dict': pd.NamedAgg('val', lambda x: np.quantile(x, y / 100)) for y in percentile_list},
)

The result looks like this. The manually specified aggregations work but the dictionary comprehension ones have the same values across different aggregations. I guess they just took the last lambda function in the generated dictionary.

p10_by_hand p90_by_hand p10_by_dict p90_by_dict
x 1.8 7.2 7.2 7.2
y 1.0 8.0 8.0 8.0

How do I fix this? I don't have to use dictionary comprehension, as long as each aggregation can be specified programmatically.

CodePudding user response:

In [23]: def agg_gen(y):
    ...:     return lambda x: np.quantile(x, y / 100)
    ...:

In [24]:  df_data.groupby('group').agg(
    ...:     **{f'p{y}_by_dict': pd.NamedAgg('val', agg_gen(y)) for y in percentile_list},
    ...: )
Out[24]:
       p10_by_dict  p90_by_dict
group
x              1.8          7.2
y              1.0          8.0

the reason your initial assign fails is because of this - What do lambda function closures capture?

  • Related