Home > Software design >  How to use column value as parameter in aggregation function in pandas?
How to use column value as parameter in aggregation function in pandas?

Time:10-06

Given a certain table of type

A B C
t r 1
t r 1
n j 2
n j 2
n j 2

I would like to group on A and B and only take the number of rows specified by C

So the desired output would be

A B C
t r 1
n j 2
n j 2

I am trying to achieve that through this function but with no luck

df.groupby(['A', 'B']).agg(lambda x: x.head(df.C))

CodePudding user response:

You can use groupby.cumcount and boolean indexing:

out = df[df['C'].gt(df.groupby(['A', 'B']).cumcount())]

Or with a classical groupby.apply:

(df.groupby(['A', 'B'], sort=False, as_index=False, group_keys=False)
   .apply(lambda g: g.head(g['C'].iloc[0]))
)

output:

   A  B  C
0  t  r  1
2  n  j  2
3  n  j  2

Intermediates for the groupby.cumcount approach:

   A  B  C  cumcount  C > cumcount
0  t  r  1         0          True
1  t  r  1         1         False
2  n  j  2         0          True
3  n  j  2         1          True
4  n  j  2         2         False

CodePudding user response:

The following seems to work and is based on GroupBy.nth:

df[::-1].groupby(['A', 'B'], as_index=False).nth(df['C'])

Output:

   A  B  C
3  n  j  2
2  n  j  2
0  t  r  1
  • Related