I want to calculate difference between max and min value for each group in pandas dataframe. For example, I wrote a code that calculates standard deviation for each group.
import pandas as pd
import statistics
months = ["Jan", "Jan", "Jan", "Jan","Feb","Feb","Feb", "Feb", "Mar", "Mar", "Mar", "Mar"]
values_1 = [1,2,7,1,3,8,9,4,7,4,11,1]
values_2 = [8,6,6,5,6,6,8,4,9,9,10,9]
df = pd.DataFrame({"Month": months,
"values_1": values_1,
"values_2": values_2})
print(df)
df = df.groupby('Month')[['values_1', 'values_2']].std().round(2)
df = df.reset_index(drop = False)
print(df)
Result looks like this:
Month values_1 values_2
0 Feb 2.94 1.63
1 Jan 2.87 1.26
2 Mar 4.27 0.50
Now I want to modify the code so that I get the same type/looks of the result, but with max-min values instead of standard deviation.
CodePudding user response:
You can do this:
groups = df.groupby('Month')[['values_1', 'values_2']]
df = groups.max() - groups.min()
CodePudding user response:
there is numpy's peak to peak for that also
df.groupby('Month').agg(np.ptp)
values_1 values_2
Month
Feb 6 4
Jan 6 3
Mar 10 1
CodePudding user response:
To get the difference between max and min for each group, try:
import pandas as pd
months = ["Jan", "Jan", "Jan", "Jan","Feb","Feb","Feb", "Feb", "Mar", "Mar", "Mar", "Mar"]
values_1 = [1,2,7,1,3,8,9,4,7,4,11,1]
values_2 = [8,6,6,5,6,6,8,4,9,9,10,9]
df = pd.DataFrame({"Month": months,
"values_1": values_1,
"values_2": values_2})
def process(s):
return s.max() - s.min()
df = process(df.groupby('Month')[['values_1', 'values_2']])
df = df.reset_index(drop = False)
Month values_1 values_2
0 Feb 6 4
1 Jan 6 3
2 Mar 10 1
Or simply:
df = df.groupby('Month')[['values_1', 'values_2']].apply(lambda g: g.max() - g.min())
CodePudding user response:
Update: added numpy.ptp
IIUC, you can try using agg
with a list of aggregating functions like this:
df = df.groupby('Month')[['values_1', 'values_2']].agg(['min', 'max', 'std', np.ptp]).round(2)
Output:
Month values_1 values_2
min max std ptp min max std ptp
0 Feb 3 9 2.94 6 4 8 1.63 4
1 Jan 1 7 2.87 6 5 8 1.26 3
2 Mar 1 11 4.27 10 9 10 0.50 1