I have the following dataframe
import pandas as pd
test = pd.DataFrame({'y':[1,2,3,4,5,6], 'label': ['bottom', 'top','bottom', 'top','bottom', 'top']})
y label
0 1 bottom
1 2 top
2 3 bottom
3 4 top
4 5 bottom
5 6 top
I would like to add a new column, agg_y
, which would be the the max(y)
if label=="bottom"
and min(y)
if label=="top"
. I have tried this
test['min_y'] = test.groupby('label').y.transform('min')
test['max_y'] = test.groupby('label').y.transform('max')
test['agg_y'] = np.where(test.label == "bottom", test.max_y, test.min_y)
test.drop(columns=['min_y', 'max_y'], inplace=True)
which gives the correct result
y label agg_y
0 1 bottom 5
1 2 top 2
2 3 bottom 5
3 4 top 2
4 5 bottom 5
5 6 top 2
I am just looking fora one-liner solution, if possible
CodePudding user response:
Your solution in one line solution is:
test['agg_y'] = np.where(test.label == "bottom",
test.groupby('label').y.transform('max'),
test.groupby('label').y.transform('min'))
Solution without groupby
, thank you @ouroboros1:
test['agg_y'] = np.where(test.label == 'bottom',
test.loc[test.label.eq('bottom'), 'y'].max(),
test.loc[test.label.ne('bottom'), 'y'].min())
Another idea is mapping values, idea is similar like ouroboros1 solution:
d = {'bottom':'max', 'top':'min'}
test['agg_y'] = test['label'].map({val:test.loc[test.label.eq(val),'y'].agg(func)
for val, func in d.items()})
print (test)
y label agg_y
0 1 bottom 5
1 2 top 2
2 3 bottom 5
3 4 top 2
4 5 bottom 5
5 6 top 2