Pandas Group by and create new column with 25th and 75th percentiles-CodePudding

I have the following pandas DataFrame:

df = pd.DataFrame({
                   'id': [1, 1, 1, 2],
                   'r': [1000, 1300, 1400, 1100],
                   's': [650, 720, 565, 600]
                 })

I'd like to aggregate the DataFrame and create a new column which is a range of r values - 25th and 75th percentiles. The aggregate for s columns is mean.

If there is only one observation for a group, then keep the observations as it is.

Expected output:

id  r               s

1   1075 - 1325     645
2   1100            600

CodePudding user response：

Here is one option, using Groupby.agg, quantile, and a regex.

NB. I am not fully sure which interpolation method you expect for the quantiles (check the linked documentation, there are several options).

import re

out = (df
 .groupby('id')
 .agg({'r': lambda x: re.sub(r'(\d (?:\.\d )?) - \1', r'\1',
                             x.quantile([0.25, 0.75])
                              .astype(str).str.cat(sep=' - ')),
      's': 'mean'})
 )

Output:

                  r      s
id                        
1   1150.0 - 1350.0  645.0
2            1100.0  600.0

CodePudding user response：

Option two:

g_id= df.groupby('id')

g_id['r'].quantile([.25, .76]).unstack().assign(s=df.groupby('id')['s'].agg('mean'))

Output:

      0.25    0.76      s
id                       
1   1150.0  1352.0  645.0
2   1100.0  1100.0  600.0