Home > database >  Calculate difference between min and max values in a group after a groupby
Calculate difference between min and max values in a group after a groupby

Time:08-24

Based on the df (test) below I calculate the mean of each group ('col1', 'col2'). After that, I would like to perform a new groupby using only 'col1' and calculate the difference between min and max values of the column 'mean', created by the first groupby.

How can this be done in an elegant way?

test=pd.DataFrame({'col1':['B', 'A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'C', 'A'],
             'col2':['W', 'L', 'W', 'L', 'W', 'L', 'L', 'L', 'W', 'L', 'W', 'L', 'L', 'W', 'W', 'L', 'L'],
             'value':[32,54,65,24,54,39,76,51,21,4,46,73,59,23, 43,23,12]})

print(test.groupby(['col1', 'col2'])[['value']].agg(
    n=('value', 'count'),
    mean=('value', 'mean')))

CodePudding user response:

You can aggregate using the numpy.ptp method:

(test.groupby(['col1', 'col2'])[['value']]
     .agg(n=('value', 'count'), # this is now useless
          mean=('value', 'mean'))
     .groupby('col1').agg(diff=('mean', np.ptp))
)

Alternative: use lambda g: g.max()-g.min() as aggregation function.

Output:

       diff
col1       
A      4.00
B     24.75
C     11.00
  • Related