Home > OS >  Calculate difference between min and max for each Pandas group
Calculate difference between min and max for each Pandas group

Time:02-14

I want to calculate difference between max and min value for each group in pandas dataframe. For example, I wrote a code that calculates standard deviation for each group.

import pandas as pd
import statistics

months = ["Jan", "Jan", "Jan", "Jan","Feb","Feb","Feb", "Feb", "Mar", "Mar", "Mar", "Mar"]
values_1 = [1,2,7,1,3,8,9,4,7,4,11,1]
values_2 = [8,6,6,5,6,6,8,4,9,9,10,9]
df = pd.DataFrame({"Month": months,
                   "values_1": values_1,
                   "values_2": values_2})
                   
print(df)

df = df.groupby('Month')[['values_1', 'values_2']].std().round(2)
df = df.reset_index(drop = False)
print(df)

Result looks like this:

    Month  values_1  values_2
0   Feb      2.94      1.63
1   Jan      2.87      1.26
2   Mar      4.27      0.50

Now I want to modify the code so that I get the same type/looks of the result, but with max-min values instead of standard deviation.

CodePudding user response:

You can do this:

groups = df.groupby('Month')[['values_1', 'values_2']]
df = groups.max() - groups.min()

CodePudding user response:

there is numpy's peak to peak for that also

df.groupby('Month').agg(np.ptp)

       values_1  values_2
Month
Feb           6         4
Jan           6         3
Mar          10         1

CodePudding user response:

To get the difference between max and min for each group, try:

import pandas as pd

months = ["Jan", "Jan", "Jan", "Jan","Feb","Feb","Feb", "Feb", "Mar", "Mar", "Mar", "Mar"]
values_1 = [1,2,7,1,3,8,9,4,7,4,11,1]
values_2 = [8,6,6,5,6,6,8,4,9,9,10,9]
df = pd.DataFrame({"Month": months,
                   "values_1": values_1,
                   "values_2": values_2})
                   
def process(s):
  return s.max() - s.min()
df = process(df.groupby('Month')[['values_1', 'values_2']])
df = df.reset_index(drop = False)
  Month  values_1  values_2
0   Feb         6         4
1   Jan         6         3
2   Mar        10         1

Or simply:

df = df.groupby('Month')[['values_1', 'values_2']].apply(lambda g: g.max() - g.min())

CodePudding user response:

Update: added numpy.ptp IIUC, you can try using agg with a list of aggregating functions like this:

df = df.groupby('Month')[['values_1', 'values_2']].agg(['min', 'max', 'std', np.ptp]).round(2)

Output:

  Month values_1               values_2              
             min max   std ptp      min max   std ptp
0   Feb        3   9  2.94   6        4   8  1.63   4
1   Jan        1   7  2.87   6        5   8  1.26   3
2   Mar        1  11  4.27  10        9  10  0.50   1
  • Related