Given a certain table of type
A | B | C |
---|---|---|
t | r | 1 |
t | r | 1 |
n | j | 2 |
n | j | 2 |
n | j | 2 |
I would like to group on A
and B
and only take the number of rows specified by C
So the desired output would be
A | B | C |
---|---|---|
t | r | 1 |
n | j | 2 |
n | j | 2 |
I am trying to achieve that through this function but with no luck
df.groupby(['A', 'B']).agg(lambda x: x.head(df.C))
CodePudding user response:
You can use groupby.cumcount
and boolean indexing:
out = df[df['C'].gt(df.groupby(['A', 'B']).cumcount())]
Or with a classical groupby.apply
:
(df.groupby(['A', 'B'], sort=False, as_index=False, group_keys=False)
.apply(lambda g: g.head(g['C'].iloc[0]))
)
output:
A B C
0 t r 1
2 n j 2
3 n j 2
Intermediates for the groupby.cumcount
approach:
A B C cumcount C > cumcount
0 t r 1 0 True
1 t r 1 1 False
2 n j 2 0 True
3 n j 2 1 True
4 n j 2 2 False
CodePudding user response:
The following seems to work and is based on GroupBy.nth
:
df[::-1].groupby(['A', 'B'], as_index=False).nth(df['C'])
Output:
A B C
3 n j 2
2 n j 2
0 t r 1