I have the following pandas DataFrame:
df = pd.DataFrame({
'id': [1, 1, 1, 2],
'r': [1000, 1300, 1400, 1100],
's': [650, 720, 565, 600]
})
I'd like to aggregate the DataFrame and create a new column which is a range of r
values - 25th
and 75th
percentiles. The aggregate for s
columns is mean
.
If there is only one observation for a group, then keep the observations as it is.
Expected output:
id r s
1 1075 - 1325 645
2 1100 600
CodePudding user response:
Here is one option, using Groupby.agg
, quantile
, and a regex.
NB. I am not fully sure which interpolation method you expect for the quantiles (check the linked documentation, there are several options).
import re
out = (df
.groupby('id')
.agg({'r': lambda x: re.sub(r'(\d (?:\.\d )?) - \1', r'\1',
x.quantile([0.25, 0.75])
.astype(str).str.cat(sep=' - ')),
's': 'mean'})
)
Output:
r s
id
1 1150.0 - 1350.0 645.0
2 1100.0 600.0
CodePudding user response:
Option two:
g_id= df.groupby('id')
g_id['r'].quantile([.25, .76]).unstack().assign(s=df.groupby('id')['s'].agg('mean'))
Output:
0.25 0.76 s
id
1 1150.0 1352.0 645.0
2 1100.0 1100.0 600.0