Based on the df (test) below I calculate the mean of each group ('col1', 'col2'). After that, I would like to perform a new groupby using only 'col1' and calculate the difference between min and max values of the column 'mean', created by the first groupby.
How can this be done in an elegant way?
test=pd.DataFrame({'col1':['B', 'A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'C', 'A'],
'col2':['W', 'L', 'W', 'L', 'W', 'L', 'L', 'L', 'W', 'L', 'W', 'L', 'L', 'W', 'W', 'L', 'L'],
'value':[32,54,65,24,54,39,76,51,21,4,46,73,59,23, 43,23,12]})
print(test.groupby(['col1', 'col2'])[['value']].agg(
n=('value', 'count'),
mean=('value', 'mean')))
CodePudding user response:
You can aggregate using the numpy.ptp
method:
(test.groupby(['col1', 'col2'])[['value']]
.agg(n=('value', 'count'), # this is now useless
mean=('value', 'mean'))
.groupby('col1').agg(diff=('mean', np.ptp))
)
Alternative: use lambda g: g.max()-g.min()
as aggregation function.
Output:
diff
col1
A 4.00
B 24.75
C 11.00