for a dataframe of:
import pandas as pd
df = pd.DataFrame({
'trigger':[0,0,0, 1,1,1, 2,2,2, 3,3,3,],
'score' :[1,0,0, 0,1,0 ,0,0,1 ,1,1,1],
'label' :[1,0,0, 0,1,0 ,0,0,1 ,1,1,1]
})
# in reality ranked using some other column
df['rank'] = df.groupby(['trigger']).cumcount()
display(df)
I almost want to compute:
d_eval = df[df['rank'] <=2]
d_eval.groupby(['trigger']).agg({'score':'max', 'label':'max'})
However, I do not want to loose the other values (not included in the rank-filter), but only consider the filtered ones in the aggregation.
Is there another way than:
- filtering
- joining the result back
- continue
more straight forward in pandas?
CodePudding user response:
One option is merge
:
d_eval = (df[df['rank'] <=2].groupby(['trigger'])
.agg({'score':'max', 'label':'max'})
)
df.merge(d_eval, on='trigger', suffixes=['','_max'])
Output:
trigger score label rank score_max label_max
0 0 1 1 0 1 1
1 0 0 0 1 1 1
2 0 0 0 2 1 1
3 1 0 0 0 1 1
4 1 1 1 1 1 1
5 1 0 0 2 1 1
6 2 0 0 0 1 1
7 2 0 0 1 1 1
8 2 1 1 2 1 1
9 3 1 1 0 1 1
10 3 1 1 1 1 1
11 3 1 1 2 1 1
Or sort-of one-liner
df.merge(df.assign(rank=df.groupby('trigger').cumcount())
.query('rank <=2')
.groupby('trigger')[['score','label']].max(),
on='trigger', suffixes=['','_max']
)