Home > Software engineering >  Different aggregate function based on value of column pandas
Different aggregate function based on value of column pandas

Time:10-31

I have the following dataframe

import pandas as pd
test = pd.DataFrame({'y':[1,2,3,4,5,6], 'label': ['bottom', 'top','bottom', 'top','bottom', 'top']})

y   label
0   1   bottom
1   2   top
2   3   bottom
3   4   top
4   5   bottom
5   6   top

I would like to add a new column, agg_y, which would be the the max(y) if label=="bottom" and min(y) if label=="top". I have tried this

test['min_y'] = test.groupby('label').y.transform('min')
test['max_y'] = test.groupby('label').y.transform('max')
test['agg_y'] = np.where(test.label == "bottom", test.max_y, test.min_y)
test.drop(columns=['min_y', 'max_y'], inplace=True)

which gives the correct result

y   label   agg_y
0   1   bottom  5
1   2   top 2
2   3   bottom  5
3   4   top 2
4   5   bottom  5
5   6   top 2

I am just looking fora one-liner solution, if possible

CodePudding user response:

Your solution in one line solution is:

test['agg_y'] = np.where(test.label == "bottom",
                         test.groupby('label').y.transform('max'), 
                         test.groupby('label').y.transform('min'))

Solution without groupby, thank you @ouroboros1:

test['agg_y'] = np.where(test.label == 'bottom', 
                         test.loc[test.label.eq('bottom'), 'y'].max(), 
                         test.loc[test.label.ne('bottom'), 'y'].min())

Another idea is mapping values, idea is similar like ouroboros1 solution:

d = {'bottom':'max', 'top':'min'}
test['agg_y'] = test['label'].map({val:test.loc[test.label.eq(val),'y'].agg(func) 
                                   for val, func in d.items()})

print (test)
   y   label  agg_y
0  1  bottom      5
1  2     top      2
2  3  bottom      5
3  4     top      2
4  5  bottom      5
5  6     top      2
  • Related